From patchwork Wed Jan 22 00:56:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Paasch X-Patchwork-Id: 1226852 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=198.145.21.10; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=quarantine dis=none) header.from=apple.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=apple.com header.i=@apple.com header.a=rsa-sha256 header.s=20180706 header.b=d/mQmVAd; dkim-atps=neutral Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 482Rnc0YX2z9sRX for ; Wed, 22 Jan 2020 11:57:07 +1100 (AEDT) Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id D8F301007B8E0; Tue, 21 Jan 2020 17:00:23 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=17.151.62.67; helo=nwk-aaemail-lapp02.apple.com; envelope-from=cpaasch@apple.com; receiver= Received: from nwk-aaemail-lapp02.apple.com (nwk-aaemail-lapp02.apple.com [17.151.62.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 18C1A1007B8C8 for ; Tue, 21 Jan 2020 17:00:22 -0800 (PST) Received: from pps.filterd (nwk-aaemail-lapp02.apple.com [127.0.0.1]) by nwk-aaemail-lapp02.apple.com (8.16.0.27/8.16.0.27) with SMTP id 00M0v2eE020009; Tue, 21 Jan 2020 16:57:02 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=apple.com; h=sender : from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=20180706; bh=FX1uFqalxfDsVnErVL0hDSqSWP1nHgs75D6ZE9NvEfc=; b=d/mQmVAdilWGDZVaUXSsCI84MK0GMXTzW9tYtp3CI6vkuOhmizVHpIegnq5vsvsYjge1 UjLat6IXG27xxEQHH+YySP+4Lw3CPSj2pyTWuc43YEsZ9fAvFh1+2xoc0zB3CPeKB58k l/rXFOX8SmPUZHsx3xHeBUjiKPUktrQPlpNpmZSnCb1BscR6n49zlxZ39UP+92zWtN7c KD8HOOuTAZtDRPSF2uQBQxdXSDtzN+iv23220XW4Pgc2N14hIACaFJWE2J/tyb/+TKB0 5J/h04C/YCTYCUgmtal3cPq4v6KahPyb3+E+VstnGIuHtrUDBnc/boqgHbh9KSMUnFBJ aQ== Received: from ma1-mtap-s03.corp.apple.com (ma1-mtap-s03.corp.apple.com [17.40.76.7]) by nwk-aaemail-lapp02.apple.com with ESMTP id 2xkyfq8e4a-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Tue, 21 Jan 2020 16:57:02 -0800 Received: from nwk-mmpp-sz13.apple.com (nwk-mmpp-sz13.apple.com [17.128.115.216]) by ma1-mtap-s03.corp.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPS id <0Q4H00064HAZD600@ma1-mtap-s03.corp.apple.com>; Tue, 21 Jan 2020 16:57:01 -0800 (PST) Received: from process_milters-daemon.nwk-mmpp-sz13.apple.com by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) id <0Q4H00500GU2WL00@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:56:59 -0800 (PST) X-Va-A: X-Va-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-Va-E-CD: ed3e5ef442236977e57274b3b0ea5602 X-Va-R-CD: 4c9e62fa76176fe31a4ea587a70c1c88 X-Va-CD: 0 X-Va-ID: 9ec772a6-9480-473f-b4db-34472d414ad9 X-V-A: X-V-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-V-E-CD: ed3e5ef442236977e57274b3b0ea5602 X-V-R-CD: 4c9e62fa76176fe31a4ea587a70c1c88 X-V-CD: 0 X-V-ID: 8759a21f-dc6f-43be-886e-e2259b82bab9 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2020-01-17_05:,, signatures=0 Received: from localhost ([17.192.155.241]) by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPSA id <0Q4H0011NHAZ4Y50@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:56:59 -0800 (PST) Sender: cpaasch@apple.com From: Christoph Paasch To: netdev@vger.kernel.org Date: Tue, 21 Jan 2020 16:56:15 -0800 Message-id: <20200122005633.21229-2-cpaasch@apple.com> X-Mailer: git-send-email 2.23.0 In-reply-to: <20200122005633.21229-1-cpaasch@apple.com> References: <20200122005633.21229-1-cpaasch@apple.com> MIME-version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2020-01-17_05:, , signatures=0 Message-ID-Hash: LDA2QXMIAHGHFGVKITINRHWIKMHUWSI3 X-Message-ID-Hash: LDA2QXMIAHGHFGVKITINRHWIKMHUWSI3 X-MailFrom: cpaasch@apple.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: mptcp@lists.01.org X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH net-next v3 01/19] mptcp: Add MPTCP socket stubs List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: From: Mat Martineau Implements the infrastructure for MPTCP sockets. MPTCP sockets open one in-kernel TCP socket per subflow. These subflow sockets are only managed by the MPTCP socket that owns them and are not visible from userspace. This commit allows a userspace program to open an MPTCP socket with: sock = socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP); The resulting socket is simply a wrapper around a single regular TCP socket, without any of the MPTCP protocol implemented over the wire. Co-developed-by: Florian Westphal Signed-off-by: Florian Westphal Co-developed-by: Peter Krystad Signed-off-by: Peter Krystad Co-developed-by: Matthieu Baerts Signed-off-by: Matthieu Baerts Co-developed-by: Paolo Abeni Signed-off-by: Paolo Abeni Signed-off-by: Mat Martineau Signed-off-by: Christoph Paasch --- MAINTAINERS | 1 + include/net/mptcp.h | 16 +++++ net/Kconfig | 1 + net/Makefile | 1 + net/ipv4/tcp.c | 2 + net/ipv6/tcp_ipv6.c | 7 +++ net/mptcp/Kconfig | 16 +++++ net/mptcp/Makefile | 4 ++ net/mptcp/protocol.c | 142 +++++++++++++++++++++++++++++++++++++++++++ net/mptcp/protocol.h | 22 +++++++ 10 files changed, 212 insertions(+) create mode 100644 net/mptcp/Kconfig create mode 100644 net/mptcp/Makefile create mode 100644 net/mptcp/protocol.c create mode 100644 net/mptcp/protocol.h diff --git a/MAINTAINERS b/MAINTAINERS index 702382b89c37..4be9c1b50183 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -11583,6 +11583,7 @@ W: https://github.com/multipath-tcp/mptcp_net-next/wiki B: https://github.com/multipath-tcp/mptcp_net-next/issues S: Maintained F: include/net/mptcp.h +F: net/mptcp/ NETWORKING [TCP] M: Eric Dumazet diff --git a/include/net/mptcp.h b/include/net/mptcp.h index 0573ae75c3db..98ba22379117 100644 --- a/include/net/mptcp.h +++ b/include/net/mptcp.h @@ -28,6 +28,8 @@ struct mptcp_ext { #ifdef CONFIG_MPTCP +void mptcp_init(void); + /* move the skb extension owership, with the assumption that 'to' is * newly allocated */ @@ -70,6 +72,10 @@ static inline bool mptcp_skb_can_collapse(const struct sk_buff *to, #else +static inline void mptcp_init(void) +{ +} + static inline void mptcp_skb_ext_move(struct sk_buff *to, const struct sk_buff *from) { @@ -82,4 +88,14 @@ static inline bool mptcp_skb_can_collapse(const struct sk_buff *to, } #endif /* CONFIG_MPTCP */ + +#if IS_ENABLED(CONFIG_MPTCP_IPV6) +int mptcpv6_init(void); +#elif IS_ENABLED(CONFIG_IPV6) +static inline int mptcpv6_init(void) +{ + return 0; +} +#endif + #endif /* __NET_MPTCP_H */ diff --git a/net/Kconfig b/net/Kconfig index 54916b7adb9b..b0937a700f01 100644 --- a/net/Kconfig +++ b/net/Kconfig @@ -91,6 +91,7 @@ if INET source "net/ipv4/Kconfig" source "net/ipv6/Kconfig" source "net/netlabel/Kconfig" +source "net/mptcp/Kconfig" endif # if INET diff --git a/net/Makefile b/net/Makefile index 848303d98d3d..07ea48160874 100644 --- a/net/Makefile +++ b/net/Makefile @@ -87,3 +87,4 @@ endif obj-$(CONFIG_QRTR) += qrtr/ obj-$(CONFIG_NET_NCSI) += ncsi/ obj-$(CONFIG_XDP_SOCKETS) += xdp/ +obj-$(CONFIG_MPTCP) += mptcp/ diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 6711a97de3ce..7dfb78caedaf 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -271,6 +271,7 @@ #include #include #include +#include #include #include #include @@ -4021,4 +4022,5 @@ void __init tcp_init(void) tcp_metrics_init(); BUG_ON(tcp_register_congestion_control(&tcp_reno) != 0); tcp_tasklet_init(); + mptcp_init(); } diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 5b5260103b65..60068ffde1d9 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -2163,9 +2163,16 @@ int __init tcpv6_init(void) ret = register_pernet_subsys(&tcpv6_net_ops); if (ret) goto out_tcpv6_protosw; + + ret = mptcpv6_init(); + if (ret) + goto out_tcpv6_pernet_subsys; + out: return ret; +out_tcpv6_pernet_subsys: + unregister_pernet_subsys(&tcpv6_net_ops); out_tcpv6_protosw: inet6_unregister_protosw(&tcpv6_protosw); out_tcpv6_protocol: diff --git a/net/mptcp/Kconfig b/net/mptcp/Kconfig new file mode 100644 index 000000000000..c1a99f07a4cd --- /dev/null +++ b/net/mptcp/Kconfig @@ -0,0 +1,16 @@ + +config MPTCP + bool "MPTCP: Multipath TCP" + depends on INET + select SKB_EXTENSIONS + help + Multipath TCP (MPTCP) connections send and receive data over multiple + subflows in order to utilize multiple network paths. Each subflow + uses the TCP protocol, and TCP options carry header information for + MPTCP. + +config MPTCP_IPV6 + bool "MPTCP: IPv6 support for Multipath TCP" + depends on MPTCP + select IPV6 + default y diff --git a/net/mptcp/Makefile b/net/mptcp/Makefile new file mode 100644 index 000000000000..659129d1fcbf --- /dev/null +++ b/net/mptcp/Makefile @@ -0,0 +1,4 @@ +# SPDX-License-Identifier: GPL-2.0 +obj-$(CONFIG_MPTCP) += mptcp.o + +mptcp-y := protocol.o diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c new file mode 100644 index 000000000000..5e24e7cf7d70 --- /dev/null +++ b/net/mptcp/protocol.c @@ -0,0 +1,142 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Multipath TCP + * + * Copyright (c) 2017 - 2019, Intel Corporation. + */ + +#define pr_fmt(fmt) "MPTCP: " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "protocol.h" + +static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) +{ + struct mptcp_sock *msk = mptcp_sk(sk); + struct socket *subflow = msk->subflow; + + if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL)) + return -EOPNOTSUPP; + + return sock_sendmsg(subflow, msg); +} + +static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, + int nonblock, int flags, int *addr_len) +{ + struct mptcp_sock *msk = mptcp_sk(sk); + struct socket *subflow = msk->subflow; + + if (msg->msg_flags & ~(MSG_WAITALL | MSG_DONTWAIT)) + return -EOPNOTSUPP; + + return sock_recvmsg(subflow, msg, flags); +} + +static int mptcp_init_sock(struct sock *sk) +{ + return 0; +} + +static void mptcp_close(struct sock *sk, long timeout) +{ + struct mptcp_sock *msk = mptcp_sk(sk); + + inet_sk_state_store(sk, TCP_CLOSE); + + if (msk->subflow) { + pr_debug("subflow=%p", msk->subflow->sk); + sock_release(msk->subflow); + } + + sock_orphan(sk); + sock_put(sk); +} + +static int mptcp_connect(struct sock *sk, struct sockaddr *saddr, int len) +{ + struct mptcp_sock *msk = mptcp_sk(sk); + int err; + + saddr->sa_family = AF_INET; + + pr_debug("msk=%p, subflow=%p", msk, msk->subflow->sk); + + err = kernel_connect(msk->subflow, saddr, len, 0); + + sk->sk_state = TCP_ESTABLISHED; + + return err; +} + +static struct proto mptcp_prot = { + .name = "MPTCP", + .owner = THIS_MODULE, + .init = mptcp_init_sock, + .close = mptcp_close, + .accept = inet_csk_accept, + .connect = mptcp_connect, + .shutdown = tcp_shutdown, + .sendmsg = mptcp_sendmsg, + .recvmsg = mptcp_recvmsg, + .hash = inet_hash, + .unhash = inet_unhash, + .get_port = inet_csk_get_port, + .obj_size = sizeof(struct mptcp_sock), + .no_autobind = true, +}; + +static struct inet_protosw mptcp_protosw = { + .type = SOCK_STREAM, + .protocol = IPPROTO_MPTCP, + .prot = &mptcp_prot, + .ops = &inet_stream_ops, +}; + +void __init mptcp_init(void) +{ + if (proto_register(&mptcp_prot, 1) != 0) + panic("Failed to register MPTCP proto.\n"); + + inet_register_protosw(&mptcp_protosw); +} + +#if IS_ENABLED(CONFIG_MPTCP_IPV6) +static struct proto mptcp_v6_prot; + +static struct inet_protosw mptcp_v6_protosw = { + .type = SOCK_STREAM, + .protocol = IPPROTO_MPTCP, + .prot = &mptcp_v6_prot, + .ops = &inet6_stream_ops, + .flags = INET_PROTOSW_ICSK, +}; + +int mptcpv6_init(void) +{ + int err; + + mptcp_v6_prot = mptcp_prot; + strcpy(mptcp_v6_prot.name, "MPTCPv6"); + mptcp_v6_prot.slab = NULL; + mptcp_v6_prot.obj_size = sizeof(struct mptcp_sock) + + sizeof(struct ipv6_pinfo); + + err = proto_register(&mptcp_v6_prot, 1); + if (err) + return err; + + err = inet6_register_protosw(&mptcp_v6_protosw); + if (err) + proto_unregister(&mptcp_v6_prot); + + return err; +} +#endif diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h new file mode 100644 index 000000000000..ee04a01bffd3 --- /dev/null +++ b/net/mptcp/protocol.h @@ -0,0 +1,22 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Multipath TCP + * + * Copyright (c) 2017 - 2019, Intel Corporation. + */ + +#ifndef __MPTCP_PROTOCOL_H +#define __MPTCP_PROTOCOL_H + +/* MPTCP connection sock */ +struct mptcp_sock { + /* inet_connection_sock must be the first member */ + struct inet_connection_sock sk; + struct socket *subflow; /* outgoing connect/listener/!mp_capable */ +}; + +static inline struct mptcp_sock *mptcp_sk(const struct sock *sk) +{ + return (struct mptcp_sock *)sk; +} + +#endif /* __MPTCP_PROTOCOL_H */ From patchwork Wed Jan 22 00:56:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Paasch X-Patchwork-Id: 1226857 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=198.145.21.10; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=quarantine dis=none) header.from=apple.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=apple.com header.i=@apple.com header.a=rsa-sha256 header.s=20180706 header.b=KeXfqc47; dkim-atps=neutral Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 482Rnf04bBz9sRX for ; Wed, 22 Jan 2020 11:57:09 +1100 (AEDT) Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 1C5EC1007B8EE; Tue, 21 Jan 2020 17:00:26 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=17.151.62.68; helo=nwk-aaemail-lapp03.apple.com; envelope-from=cpaasch@apple.com; receiver= Received: from nwk-aaemail-lapp03.apple.com (nwk-aaemail-lapp03.apple.com [17.151.62.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 2932D1007B8E6 for ; Tue, 21 Jan 2020 17:00:24 -0800 (PST) Received: from pps.filterd (nwk-aaemail-lapp03.apple.com [127.0.0.1]) by nwk-aaemail-lapp03.apple.com (8.16.0.27/8.16.0.27) with SMTP id 00M0urnU013197; Tue, 21 Jan 2020 16:57:05 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=apple.com; h=sender : from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=20180706; bh=3wLfVErYu8Ok/fd1Az15LlLLwPbv63mCtbZyrlS4iHA=; b=KeXfqc47/zFEVZ2bnaP/lCGDi4eSgOe4nHnne21vRYdtpZS1iVpiXtyBobIiKPik8MhS ImN0NYwsX269t8MNNswNKrI+ebBAOnhxvTf905ZsFvjsk+B4K1fR5yRpCreNaLDOS1aW w2nLiWYm/hr/JiR+UddzX3cABm3AA8CB6u3AAgJGADI62GE6cyKLZYvsAEyvfTA8quhS jcqrIWVQVS4T+gYIdLz75x/It9pC1iqlffkFHWhu3t3c1e5szUgGKo4uwZGw21WyugVw 3SACT4Nffkt7KU9I75bi+uchYGFIwX4Yzsc/6cvyVr7rrzFcamfJKG+9zPpgs/q0P9lE 1A== Received: from ma1-mtap-s01.corp.apple.com (ma1-mtap-s01.corp.apple.com [17.40.76.5]) by nwk-aaemail-lapp03.apple.com with ESMTP id 2xmk4p1699-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Tue, 21 Jan 2020 16:57:05 -0800 Received: from nwk-mmpp-sz13.apple.com (nwk-mmpp-sz13.apple.com [17.128.115.216]) by ma1-mtap-s01.corp.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPS id <0Q4H00HWQHAUJ720@ma1-mtap-s01.corp.apple.com>; Tue, 21 Jan 2020 16:57:01 -0800 (PST) Received: from process_milters-daemon.nwk-mmpp-sz13.apple.com by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) id <0Q4H00F00F5G4K00@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:56:59 -0800 (PST) X-Va-A: X-Va-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-Va-E-CD: c051a37ac4e5843d47ecc64fcf48afb5 X-Va-R-CD: f0ad5c29aa74349f82409ccb4d6aa844 X-Va-CD: 0 X-Va-ID: 469c224d-cdcc-439d-a1cc-6afe209d191c X-V-A: X-V-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-V-E-CD: c051a37ac4e5843d47ecc64fcf48afb5 X-V-R-CD: f0ad5c29aa74349f82409ccb4d6aa844 X-V-CD: 0 X-V-ID: 09427620-70d1-44b6-8c18-328a9fc87aee X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2020-01-17_05:,, signatures=0 Received: from localhost ([17.192.155.241]) by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPSA id <0Q4H00DS6HAZDC30@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:56:59 -0800 (PST) Sender: cpaasch@apple.com From: Christoph Paasch To: netdev@vger.kernel.org Date: Tue, 21 Jan 2020 16:56:16 -0800 Message-id: <20200122005633.21229-3-cpaasch@apple.com> X-Mailer: git-send-email 2.23.0 In-reply-to: <20200122005633.21229-1-cpaasch@apple.com> References: <20200122005633.21229-1-cpaasch@apple.com> MIME-version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2020-01-17_05:, , signatures=0 Message-ID-Hash: DUAGVZAO2FMSPNJMOQ7GUGCWEGBOSME7 X-Message-ID-Hash: DUAGVZAO2FMSPNJMOQ7GUGCWEGBOSME7 X-MailFrom: cpaasch@apple.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: mptcp@lists.01.org X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH net-next v3 02/19] mptcp: Handle MPTCP TCP options List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: From: Peter Krystad Add hooks to parse and format the MP_CAPABLE option. This option is handled according to MPTCP version 0 (RFC6824). MPTCP version 1 MP_CAPABLE (RFC6824bis/RFC8684) will be added later in coordination with related code changes. Co-developed-by: Matthieu Baerts Signed-off-by: Matthieu Baerts Co-developed-by: Florian Westphal Signed-off-by: Florian Westphal Co-developed-by: Davide Caratti Signed-off-by: Davide Caratti Signed-off-by: Peter Krystad Signed-off-by: Christoph Paasch --- include/linux/tcp.h | 18 ++++++++ include/net/mptcp.h | 18 ++++++++ net/ipv4/tcp_input.c | 5 +++ net/ipv4/tcp_output.c | 13 ++++++ net/mptcp/Makefile | 2 +- net/mptcp/options.c | 97 +++++++++++++++++++++++++++++++++++++++++++ net/mptcp/protocol.h | 29 +++++++++++++ 7 files changed, 181 insertions(+), 1 deletion(-) create mode 100644 net/mptcp/options.c diff --git a/include/linux/tcp.h b/include/linux/tcp.h index ca6f01531e64..52798ab00394 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -78,6 +78,16 @@ struct tcp_sack_block { #define TCP_SACK_SEEN (1 << 0) /*1 = peer is SACK capable, */ #define TCP_DSACK_SEEN (1 << 2) /*1 = DSACK was received from peer*/ +#if IS_ENABLED(CONFIG_MPTCP) +struct mptcp_options_received { + u64 sndr_key; + u64 rcvr_key; + u8 mp_capable : 1, + mp_join : 1, + dss : 1; +}; +#endif + struct tcp_options_received { /* PAWS/RTTM data */ int ts_recent_stamp;/* Time we stored ts_recent (for aging) */ @@ -95,6 +105,9 @@ struct tcp_options_received { u8 num_sacks; /* Number of SACK blocks */ u16 user_mss; /* mss requested by user in ioctl */ u16 mss_clamp; /* Maximal mss, negotiated at connection setup */ +#if IS_ENABLED(CONFIG_MPTCP) + struct mptcp_options_received mptcp; +#endif }; static inline void tcp_clear_options(struct tcp_options_received *rx_opt) @@ -104,6 +117,11 @@ static inline void tcp_clear_options(struct tcp_options_received *rx_opt) #if IS_ENABLED(CONFIG_SMC) rx_opt->smc_ok = 0; #endif +#if IS_ENABLED(CONFIG_MPTCP) + rx_opt->mptcp.mp_capable = 0; + rx_opt->mptcp.mp_join = 0; + rx_opt->mptcp.dss = 0; +#endif } /* This is the max number of SACKS that we'll generate and process. It's safe diff --git a/include/net/mptcp.h b/include/net/mptcp.h index 98ba22379117..3daec2ceb3ff 100644 --- a/include/net/mptcp.h +++ b/include/net/mptcp.h @@ -9,6 +9,7 @@ #define __NET_MPTCP_H #include +#include #include /* MPTCP sk_buff extension data */ @@ -26,10 +27,22 @@ struct mptcp_ext { /* one byte hole */ }; +struct mptcp_out_options { +#if IS_ENABLED(CONFIG_MPTCP) + u16 suboptions; + u64 sndr_key; + u64 rcvr_key; +#endif +}; + #ifdef CONFIG_MPTCP void mptcp_init(void); +void mptcp_parse_option(const unsigned char *ptr, int opsize, + struct tcp_options_received *opt_rx); +void mptcp_write_options(__be32 *ptr, struct mptcp_out_options *opts); + /* move the skb extension owership, with the assumption that 'to' is * newly allocated */ @@ -76,6 +89,11 @@ static inline void mptcp_init(void) { } +static inline void mptcp_parse_option(const unsigned char *ptr, int opsize, + struct tcp_options_received *opt_rx) +{ +} + static inline void mptcp_skb_ext_move(struct sk_buff *to, const struct sk_buff *from) { diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 358365598216..3458ee13e6f0 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -79,6 +79,7 @@ #include #include #include +#include int sysctl_tcp_max_orphans __read_mostly = NR_FILE; @@ -3924,6 +3925,10 @@ void tcp_parse_options(const struct net *net, */ break; #endif + case TCPOPT_MPTCP: + mptcp_parse_option(ptr, opsize, opt_rx); + break; + case TCPOPT_FASTOPEN: tcp_parse_fastopen_option( opsize - TCPOLEN_FASTOPEN_BASE, diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 05109d0c675b..b6cf87b02b01 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -38,6 +38,7 @@ #define pr_fmt(fmt) "TCP: " fmt #include +#include #include #include @@ -414,6 +415,7 @@ static inline bool tcp_urg_mode(const struct tcp_sock *tp) #define OPTION_WSCALE (1 << 3) #define OPTION_FAST_OPEN_COOKIE (1 << 8) #define OPTION_SMC (1 << 9) +#define OPTION_MPTCP (1 << 10) static void smc_options_write(__be32 *ptr, u16 *options) { @@ -439,8 +441,17 @@ struct tcp_out_options { __u8 *hash_location; /* temporary pointer, overloaded */ __u32 tsval, tsecr; /* need to include OPTION_TS */ struct tcp_fastopen_cookie *fastopen_cookie; /* Fast open cookie */ + struct mptcp_out_options mptcp; }; +static void mptcp_options_write(__be32 *ptr, struct tcp_out_options *opts) +{ +#if IS_ENABLED(CONFIG_MPTCP) + if (unlikely(OPTION_MPTCP & opts->options)) + mptcp_write_options(ptr, &opts->mptcp); +#endif +} + /* Write previously computed TCP options to the packet. * * Beware: Something in the Internet is very sensitive to the ordering of @@ -549,6 +560,8 @@ static void tcp_options_write(__be32 *ptr, struct tcp_sock *tp, } smc_options_write(ptr, &options); + + mptcp_options_write(ptr, opts); } static void smc_set_option(const struct tcp_sock *tp, diff --git a/net/mptcp/Makefile b/net/mptcp/Makefile index 659129d1fcbf..27a846263f08 100644 --- a/net/mptcp/Makefile +++ b/net/mptcp/Makefile @@ -1,4 +1,4 @@ # SPDX-License-Identifier: GPL-2.0 obj-$(CONFIG_MPTCP) += mptcp.o -mptcp-y := protocol.o +mptcp-y := protocol.o options.o diff --git a/net/mptcp/options.c b/net/mptcp/options.c new file mode 100644 index 000000000000..b7a31c0e5283 --- /dev/null +++ b/net/mptcp/options.c @@ -0,0 +1,97 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Multipath TCP + * + * Copyright (c) 2017 - 2019, Intel Corporation. + */ + +#include +#include +#include +#include "protocol.h" + +void mptcp_parse_option(const unsigned char *ptr, int opsize, + struct tcp_options_received *opt_rx) +{ + struct mptcp_options_received *mp_opt = &opt_rx->mptcp; + u8 subtype = *ptr >> 4; + u8 version; + u8 flags; + + switch (subtype) { + case MPTCPOPT_MP_CAPABLE: + if (opsize != TCPOLEN_MPTCP_MPC_SYN && + opsize != TCPOLEN_MPTCP_MPC_ACK) + break; + + version = *ptr++ & MPTCP_VERSION_MASK; + if (version != MPTCP_SUPPORTED_VERSION) + break; + + flags = *ptr++; + if (!((flags & MPTCP_CAP_FLAG_MASK) == MPTCP_CAP_HMAC_SHA1) || + (flags & MPTCP_CAP_EXTENSIBILITY)) + break; + + /* RFC 6824, Section 3.1: + * "For the Checksum Required bit (labeled "A"), if either + * host requires the use of checksums, checksums MUST be used. + * In other words, the only way for checksums not to be used + * is if both hosts in their SYNs set A=0." + * + * Section 3.3.0: + * "If a checksum is not present when its use has been + * negotiated, the receiver MUST close the subflow with a RST as + * it is considered broken." + * + * We don't implement DSS checksum - fall back to TCP. + */ + if (flags & MPTCP_CAP_CHECKSUM_REQD) + break; + + mp_opt->mp_capable = 1; + mp_opt->sndr_key = get_unaligned_be64(ptr); + ptr += 8; + + if (opsize == TCPOLEN_MPTCP_MPC_ACK) { + mp_opt->rcvr_key = get_unaligned_be64(ptr); + ptr += 8; + pr_debug("MP_CAPABLE sndr=%llu, rcvr=%llu", + mp_opt->sndr_key, mp_opt->rcvr_key); + } else { + pr_debug("MP_CAPABLE sndr=%llu", mp_opt->sndr_key); + } + break; + + case MPTCPOPT_DSS: + pr_debug("DSS"); + mp_opt->dss = 1; + break; + + default: + break; + } +} + +void mptcp_write_options(__be32 *ptr, struct mptcp_out_options *opts) +{ + if ((OPTION_MPTCP_MPC_SYN | + OPTION_MPTCP_MPC_ACK) & opts->suboptions) { + u8 len; + + if (OPTION_MPTCP_MPC_SYN & opts->suboptions) + len = TCPOLEN_MPTCP_MPC_SYN; + else + len = TCPOLEN_MPTCP_MPC_ACK; + + *ptr++ = htonl((TCPOPT_MPTCP << 24) | (len << 16) | + (MPTCPOPT_MP_CAPABLE << 12) | + (MPTCP_SUPPORTED_VERSION << 8) | + MPTCP_CAP_HMAC_SHA1); + put_unaligned_be64(opts->sndr_key, ptr); + ptr += 2; + if (OPTION_MPTCP_MPC_ACK & opts->suboptions) { + put_unaligned_be64(opts->rcvr_key, ptr); + ptr += 2; + } + } +} diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index ee04a01bffd3..c59cf8b220b0 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -7,6 +7,35 @@ #ifndef __MPTCP_PROTOCOL_H #define __MPTCP_PROTOCOL_H +#define MPTCP_SUPPORTED_VERSION 0 + +/* MPTCP option bits */ +#define OPTION_MPTCP_MPC_SYN BIT(0) +#define OPTION_MPTCP_MPC_SYNACK BIT(1) +#define OPTION_MPTCP_MPC_ACK BIT(2) + +/* MPTCP option subtypes */ +#define MPTCPOPT_MP_CAPABLE 0 +#define MPTCPOPT_MP_JOIN 1 +#define MPTCPOPT_DSS 2 +#define MPTCPOPT_ADD_ADDR 3 +#define MPTCPOPT_RM_ADDR 4 +#define MPTCPOPT_MP_PRIO 5 +#define MPTCPOPT_MP_FAIL 6 +#define MPTCPOPT_MP_FASTCLOSE 7 + +/* MPTCP suboption lengths */ +#define TCPOLEN_MPTCP_MPC_SYN 12 +#define TCPOLEN_MPTCP_MPC_SYNACK 12 +#define TCPOLEN_MPTCP_MPC_ACK 20 + +/* MPTCP MP_CAPABLE flags */ +#define MPTCP_VERSION_MASK (0x0F) +#define MPTCP_CAP_CHECKSUM_REQD BIT(7) +#define MPTCP_CAP_EXTENSIBILITY BIT(6) +#define MPTCP_CAP_HMAC_SHA1 BIT(0) +#define MPTCP_CAP_FLAG_MASK (0x3F) + /* MPTCP connection sock */ struct mptcp_sock { /* inet_connection_sock must be the first member */ From patchwork Wed Jan 22 00:56:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Paasch X-Patchwork-Id: 1226853 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=198.145.21.10; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=quarantine dis=none) header.from=apple.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=apple.com header.i=@apple.com header.a=rsa-sha256 header.s=20180706 header.b=Jp94Vh2f; dkim-atps=neutral Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 482Rnc0vKPz9sRk for ; Wed, 22 Jan 2020 11:57:07 +1100 (AEDT) Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id E31061007B8E4; Tue, 21 Jan 2020 17:00:23 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=17.151.62.67; helo=nwk-aaemail-lapp02.apple.com; envelope-from=cpaasch@apple.com; receiver= Received: from nwk-aaemail-lapp02.apple.com (nwk-aaemail-lapp02.apple.com [17.151.62.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id BA5081007B8C8 for ; Tue, 21 Jan 2020 17:00:22 -0800 (PST) Received: from pps.filterd (nwk-aaemail-lapp02.apple.com [127.0.0.1]) by nwk-aaemail-lapp02.apple.com (8.16.0.27/8.16.0.27) with SMTP id 00M0v2eI020009; Tue, 21 Jan 2020 16:57:04 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=apple.com; h=sender : from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=20180706; bh=gs6UKR2suQdThYh8C1E9iJlbaNIwTovlw7WovgIOVQo=; b=Jp94Vh2f5YuM5gK4wfXd9gBp2+Im7pv1nFHBdOjwlAwgJ/XKj3Iib46Dnrt4bvy0cZOF nZvhzvlqkflD9oTPvHK2ydmIe4b+pVby4lKc5g/fOMJexF2XQ5eLQ3pTUboCQo0EPMNV 8nn5c0K3SEH9lOHVsG+nMiDrlHClyle82e93CvvWNDVzLqQZKfSqHiv2k5DzDGhw7UTs PozrFvs6EVy5LBim2J/0hklfbuxYvmbELk6PX6lQXJn/WXNOlyup1ynjQk99J1KFNhcH TfJC0uN/52jqoxGsZLcm/CBRUhR7H8DJMzKpy3S1Ak7O02wvpWkM5ZpbeKGGtShCn9Wz fA== Received: from ma1-mtap-s03.corp.apple.com (ma1-mtap-s03.corp.apple.com [17.40.76.7]) by nwk-aaemail-lapp02.apple.com with ESMTP id 2xkyfq8e4a-5 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Tue, 21 Jan 2020 16:57:04 -0800 Received: from nwk-mmpp-sz13.apple.com (nwk-mmpp-sz13.apple.com [17.128.115.216]) by ma1-mtap-s03.corp.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPS id <0Q4H00064HAZD600@ma1-mtap-s03.corp.apple.com>; Tue, 21 Jan 2020 16:57:02 -0800 (PST) Received: from process_milters-daemon.nwk-mmpp-sz13.apple.com by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) id <0Q4H00F00F5G4K00@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:57:02 -0800 (PST) X-Va-A: X-Va-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-Va-E-CD: 60a0f09f847512109065e610d7263197 X-Va-R-CD: 4197ca1c26058c3761cc0c7df3fab283 X-Va-CD: 0 X-Va-ID: d0f6f541-a672-43a5-bc90-d061696eed6a X-V-A: X-V-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-V-E-CD: 60a0f09f847512109065e610d7263197 X-V-R-CD: 4197ca1c26058c3761cc0c7df3fab283 X-V-CD: 0 X-V-ID: 30fc6e5d-b89f-4087-99dc-c11b64f7cae2 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2020-01-17_05:,, signatures=0 Received: from localhost ([17.192.155.241]) by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPSA id <0Q4H0011RHAZ4Y50@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:56:59 -0800 (PST) Sender: cpaasch@apple.com From: Christoph Paasch To: netdev@vger.kernel.org Date: Tue, 21 Jan 2020 16:56:17 -0800 Message-id: <20200122005633.21229-4-cpaasch@apple.com> X-Mailer: git-send-email 2.23.0 In-reply-to: <20200122005633.21229-1-cpaasch@apple.com> References: <20200122005633.21229-1-cpaasch@apple.com> MIME-version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2020-01-17_05:, , signatures=0 Message-ID-Hash: XTZLTIJXKHNIHH4JBYPL2IJVNAYRNRGX X-Message-ID-Hash: XTZLTIJXKHNIHH4JBYPL2IJVNAYRNRGX X-MailFrom: cpaasch@apple.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: mptcp@lists.01.org X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH net-next v3 03/19] mptcp: Associate MPTCP context with TCP socket List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: From: Peter Krystad Use ULP to associate a subflow_context structure with each TCP subflow socket. Creating these sockets requires new bind and connect functions to make sure ULP is set up immediately when the subflow sockets are created. Co-developed-by: Florian Westphal Signed-off-by: Florian Westphal Co-developed-by: Matthieu Baerts Signed-off-by: Matthieu Baerts Co-developed-by: Davide Caratti Signed-off-by: Davide Caratti Co-developed-by: Paolo Abeni Signed-off-by: Paolo Abeni Signed-off-by: Peter Krystad Signed-off-by: Christoph Paasch --- include/linux/tcp.h | 3 + net/mptcp/Makefile | 2 +- net/mptcp/protocol.c | 132 +++++++++++++++++++++++++++++++++++++++++-- net/mptcp/protocol.h | 26 +++++++++ net/mptcp/subflow.c | 119 ++++++++++++++++++++++++++++++++++++++ 5 files changed, 275 insertions(+), 7 deletions(-) create mode 100644 net/mptcp/subflow.c diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 52798ab00394..877947475814 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -397,6 +397,9 @@ struct tcp_sock { u32 mtu_info; /* We received an ICMP_FRAG_NEEDED / ICMPV6_PKT_TOOBIG * while socket was owned by user. */ +#if IS_ENABLED(CONFIG_MPTCP) + bool is_mptcp; +#endif #ifdef CONFIG_TCP_MD5SIG /* TCP AF-Specific parts; only used by MD5 Signature support so far */ diff --git a/net/mptcp/Makefile b/net/mptcp/Makefile index 27a846263f08..e1ee5aade8b0 100644 --- a/net/mptcp/Makefile +++ b/net/mptcp/Makefile @@ -1,4 +1,4 @@ # SPDX-License-Identifier: GPL-2.0 obj-$(CONFIG_MPTCP) += mptcp.o -mptcp-y := protocol.o options.o +mptcp-y := protocol.o subflow.o options.o diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 5e24e7cf7d70..294b03a0393a 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -17,6 +17,53 @@ #include #include "protocol.h" +#define MPTCP_SAME_STATE TCP_MAX_STATES + +/* If msk has an initial subflow socket, and the MP_CAPABLE handshake has not + * completed yet or has failed, return the subflow socket. + * Otherwise return NULL. + */ +static struct socket *__mptcp_nmpc_socket(const struct mptcp_sock *msk) +{ + if (!msk->subflow) + return NULL; + + return msk->subflow; +} + +static bool __mptcp_can_create_subflow(const struct mptcp_sock *msk) +{ + return ((struct sock *)msk)->sk_state == TCP_CLOSE; +} + +static struct socket *__mptcp_socket_create(struct mptcp_sock *msk, int state) +{ + struct mptcp_subflow_context *subflow; + struct sock *sk = (struct sock *)msk; + struct socket *ssock; + int err; + + ssock = __mptcp_nmpc_socket(msk); + if (ssock) + goto set_state; + + if (!__mptcp_can_create_subflow(msk)) + return ERR_PTR(-EINVAL); + + err = mptcp_subflow_create_socket(sk, &ssock); + if (err) + return ERR_PTR(err); + + msk->subflow = ssock; + subflow = mptcp_subflow_ctx(ssock->sk); + subflow->request_mptcp = 1; + +set_state: + if (state != MPTCP_SAME_STATE) + inet_sk_state_store(sk, state); + return ssock; +} + static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) { struct mptcp_sock *msk = mptcp_sk(sk); @@ -48,12 +95,14 @@ static int mptcp_init_sock(struct sock *sk) static void mptcp_close(struct sock *sk, long timeout) { struct mptcp_sock *msk = mptcp_sk(sk); + struct socket *ssock; inet_sk_state_store(sk, TCP_CLOSE); - if (msk->subflow) { - pr_debug("subflow=%p", msk->subflow->sk); - sock_release(msk->subflow); + ssock = __mptcp_nmpc_socket(msk); + if (ssock) { + pr_debug("subflow=%p", mptcp_subflow_ctx(ssock->sk)); + sock_release(ssock); } sock_orphan(sk); @@ -67,7 +116,8 @@ static int mptcp_connect(struct sock *sk, struct sockaddr *saddr, int len) saddr->sa_family = AF_INET; - pr_debug("msk=%p, subflow=%p", msk, msk->subflow->sk); + pr_debug("msk=%p, subflow=%p", msk, + mptcp_subflow_ctx(msk->subflow->sk)); err = kernel_connect(msk->subflow, saddr, len, 0); @@ -93,15 +143,79 @@ static struct proto mptcp_prot = { .no_autobind = true, }; +static int mptcp_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) +{ + struct mptcp_sock *msk = mptcp_sk(sock->sk); + struct socket *ssock; + int err = -ENOTSUPP; + + if (uaddr->sa_family != AF_INET) // @@ allow only IPv4 for now + return err; + + lock_sock(sock->sk); + ssock = __mptcp_socket_create(msk, MPTCP_SAME_STATE); + if (IS_ERR(ssock)) { + err = PTR_ERR(ssock); + goto unlock; + } + + err = ssock->ops->bind(ssock, uaddr, addr_len); + +unlock: + release_sock(sock->sk); + return err; +} + +static int mptcp_stream_connect(struct socket *sock, struct sockaddr *uaddr, + int addr_len, int flags) +{ + struct mptcp_sock *msk = mptcp_sk(sock->sk); + struct socket *ssock; + int err; + + lock_sock(sock->sk); + ssock = __mptcp_socket_create(msk, TCP_SYN_SENT); + if (IS_ERR(ssock)) { + err = PTR_ERR(ssock); + goto unlock; + } + + err = ssock->ops->connect(ssock, uaddr, addr_len, flags); + inet_sk_state_store(sock->sk, inet_sk_state_load(ssock->sk)); + +unlock: + release_sock(sock->sk); + return err; +} + +static __poll_t mptcp_poll(struct file *file, struct socket *sock, + struct poll_table_struct *wait) +{ + __poll_t mask = 0; + + return mask; +} + +static struct proto_ops mptcp_stream_ops; + static struct inet_protosw mptcp_protosw = { .type = SOCK_STREAM, .protocol = IPPROTO_MPTCP, .prot = &mptcp_prot, - .ops = &inet_stream_ops, + .ops = &mptcp_stream_ops, + .flags = INET_PROTOSW_ICSK, }; void __init mptcp_init(void) { + mptcp_prot.h.hashinfo = tcp_prot.h.hashinfo; + mptcp_stream_ops = inet_stream_ops; + mptcp_stream_ops.bind = mptcp_bind; + mptcp_stream_ops.connect = mptcp_stream_connect; + mptcp_stream_ops.poll = mptcp_poll; + + mptcp_subflow_init(); + if (proto_register(&mptcp_prot, 1) != 0) panic("Failed to register MPTCP proto.\n"); @@ -109,13 +223,14 @@ void __init mptcp_init(void) } #if IS_ENABLED(CONFIG_MPTCP_IPV6) +static struct proto_ops mptcp_v6_stream_ops; static struct proto mptcp_v6_prot; static struct inet_protosw mptcp_v6_protosw = { .type = SOCK_STREAM, .protocol = IPPROTO_MPTCP, .prot = &mptcp_v6_prot, - .ops = &inet6_stream_ops, + .ops = &mptcp_v6_stream_ops, .flags = INET_PROTOSW_ICSK, }; @@ -133,6 +248,11 @@ int mptcpv6_init(void) if (err) return err; + mptcp_v6_stream_ops = inet6_stream_ops; + mptcp_v6_stream_ops.bind = mptcp_bind; + mptcp_v6_stream_ops.connect = mptcp_stream_connect; + mptcp_v6_stream_ops.poll = mptcp_poll; + err = inet6_register_protosw(&mptcp_v6_protosw); if (err) proto_unregister(&mptcp_v6_prot); diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index c59cf8b220b0..543d4d5d8985 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -48,4 +48,30 @@ static inline struct mptcp_sock *mptcp_sk(const struct sock *sk) return (struct mptcp_sock *)sk; } +/* MPTCP subflow context */ +struct mptcp_subflow_context { + u32 request_mptcp : 1; /* send MP_CAPABLE */ + struct sock *tcp_sock; /* tcp sk backpointer */ + struct sock *conn; /* parent mptcp_sock */ + struct rcu_head rcu; +}; + +static inline struct mptcp_subflow_context * +mptcp_subflow_ctx(const struct sock *sk) +{ + struct inet_connection_sock *icsk = inet_csk(sk); + + /* Use RCU on icsk_ulp_data only for sock diag code */ + return (__force struct mptcp_subflow_context *)icsk->icsk_ulp_data; +} + +static inline struct sock * +mptcp_subflow_tcp_sock(const struct mptcp_subflow_context *subflow) +{ + return subflow->tcp_sock; +} + +void mptcp_subflow_init(void); +int mptcp_subflow_create_socket(struct sock *sk, struct socket **new_sock); + #endif /* __MPTCP_PROTOCOL_H */ diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c new file mode 100644 index 000000000000..bf8139353653 --- /dev/null +++ b/net/mptcp/subflow.c @@ -0,0 +1,119 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Multipath TCP + * + * Copyright (c) 2017 - 2019, Intel Corporation. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "protocol.h" + +int mptcp_subflow_create_socket(struct sock *sk, struct socket **new_sock) +{ + struct mptcp_subflow_context *subflow; + struct net *net = sock_net(sk); + struct socket *sf; + int err; + + err = sock_create_kern(net, PF_INET, SOCK_STREAM, IPPROTO_TCP, &sf); + if (err) + return err; + + lock_sock(sf->sk); + + /* kernel sockets do not by default acquire net ref, but TCP timer + * needs it. + */ + sf->sk->sk_net_refcnt = 1; + get_net(net); + this_cpu_add(*net->core.sock_inuse, 1); + err = tcp_set_ulp(sf->sk, "mptcp"); + release_sock(sf->sk); + + if (err) + return err; + + subflow = mptcp_subflow_ctx(sf->sk); + pr_debug("subflow=%p", subflow); + + *new_sock = sf; + subflow->conn = sk; + + return 0; +} + +static struct mptcp_subflow_context *subflow_create_ctx(struct sock *sk, + gfp_t priority) +{ + struct inet_connection_sock *icsk = inet_csk(sk); + struct mptcp_subflow_context *ctx; + + ctx = kzalloc(sizeof(*ctx), priority); + if (!ctx) + return NULL; + + rcu_assign_pointer(icsk->icsk_ulp_data, ctx); + + pr_debug("subflow=%p", ctx); + + ctx->tcp_sock = sk; + + return ctx; +} + +static int subflow_ulp_init(struct sock *sk) +{ + struct mptcp_subflow_context *ctx; + struct tcp_sock *tp = tcp_sk(sk); + int err = 0; + + /* disallow attaching ULP to a socket unless it has been + * created with sock_create_kern() + */ + if (!sk->sk_kern_sock) { + err = -EOPNOTSUPP; + goto out; + } + + ctx = subflow_create_ctx(sk, GFP_KERNEL); + if (!ctx) { + err = -ENOMEM; + goto out; + } + + pr_debug("subflow=%p, family=%d", ctx, sk->sk_family); + + tp->is_mptcp = 1; +out: + return err; +} + +static void subflow_ulp_release(struct sock *sk) +{ + struct mptcp_subflow_context *ctx = mptcp_subflow_ctx(sk); + + if (!ctx) + return; + + kfree_rcu(ctx, rcu); +} + +static struct tcp_ulp_ops subflow_ulp_ops __read_mostly = { + .name = "mptcp", + .owner = THIS_MODULE, + .init = subflow_ulp_init, + .release = subflow_ulp_release, +}; + +void mptcp_subflow_init(void) +{ + if (tcp_register_ulp(&subflow_ulp_ops) != 0) + panic("MPTCP: failed to register subflows to ULP\n"); +} From patchwork Wed Jan 22 00:56:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Paasch X-Patchwork-Id: 1226856 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=2001:19d0:306:5::1; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=quarantine dis=none) header.from=apple.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=apple.com header.i=@apple.com header.a=rsa-sha256 header.s=20180706 header.b=BxgYiUqF; dkim-atps=neutral Received: from ml01.01.org (ml01.01.org [IPv6:2001:19d0:306:5::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 482Rnd6W1zz9sRR for ; Wed, 22 Jan 2020 11:57:09 +1100 (AEDT) Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 274A51007B8F2; Tue, 21 Jan 2020 17:00:26 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=17.151.62.66; helo=nwk-aaemail-lapp01.apple.com; envelope-from=cpaasch@apple.com; receiver= Received: from nwk-aaemail-lapp01.apple.com (nwk-aaemail-lapp01.apple.com [17.151.62.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 4E3C21007B8DD for ; Tue, 21 Jan 2020 17:00:24 -0800 (PST) Received: from pps.filterd (nwk-aaemail-lapp01.apple.com [127.0.0.1]) by nwk-aaemail-lapp01.apple.com (8.16.0.27/8.16.0.27) with SMTP id 00M0urWt049855; Tue, 21 Jan 2020 16:57:05 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=apple.com; h=sender : from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=20180706; bh=j6y/qth5lA7S1qecwO0e14xnPuR+LRFQola9iyZ9FyI=; b=BxgYiUqFroh/vsX5FFAmW34zsXlgkEoUWRgZe4sfQvGmQP/DjtpPBRWAKT17qqkrj8/Y +51EvkcsQrBJYqt6owLfFFjIAfDFznqWnN87A18SDTbRXJSuY5UAHGHjL+zasZdgONYZ 5aO0dlgypB0tyLLudIqK6XLturXKbdmKprSYYIns470DnEGepY6lQSbemzBANAHe882e Wv7Wfh1OuhxIQUqTQyWYctD3pKMMgyP1BOG2SvppsMiEviwf/cRUaIlXMlMAgqGz8KnC WNJBal4uXh+YUnRFx7ZIc/SSXYjG7Azwfh6ZqWkkpbEvcQ72bI9AaMsRhhkFc3yC5lOT IA== Received: from ma1-mtap-s01.corp.apple.com (ma1-mtap-s01.corp.apple.com [17.40.76.5]) by nwk-aaemail-lapp01.apple.com with ESMTP id 2xm1u51ya3-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Tue, 21 Jan 2020 16:57:05 -0800 Received: from nwk-mmpp-sz13.apple.com (nwk-mmpp-sz13.apple.com [17.128.115.216]) by ma1-mtap-s01.corp.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPS id <0Q4H00HWQHAUJ720@ma1-mtap-s01.corp.apple.com>; Tue, 21 Jan 2020 16:57:02 -0800 (PST) Received: from process_milters-daemon.nwk-mmpp-sz13.apple.com by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) id <0Q4H00100GR2PR00@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:57:00 -0800 (PST) X-Va-A: X-Va-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-Va-E-CD: 117345fd82fd154a76c01a1fc8b877d9 X-Va-R-CD: 27f76d78a33ea57ff72fdcf7e72f3c3f X-Va-CD: 0 X-Va-ID: 9590bf0b-62be-4777-9353-09a4f7481e10 X-V-A: X-V-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-V-E-CD: 117345fd82fd154a76c01a1fc8b877d9 X-V-R-CD: 27f76d78a33ea57ff72fdcf7e72f3c3f X-V-CD: 0 X-V-ID: df517f51-71c4-44fb-b7fc-c217e76aaac0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2020-01-17_05:,, signatures=0 Received: from localhost ([17.192.155.241]) by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPSA id <0Q4H0011THAZ4Y50@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:56:59 -0800 (PST) Sender: cpaasch@apple.com From: Christoph Paasch To: netdev@vger.kernel.org Date: Tue, 21 Jan 2020 16:56:18 -0800 Message-id: <20200122005633.21229-5-cpaasch@apple.com> X-Mailer: git-send-email 2.23.0 In-reply-to: <20200122005633.21229-1-cpaasch@apple.com> References: <20200122005633.21229-1-cpaasch@apple.com> MIME-version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2020-01-17_05:, , signatures=0 Message-ID-Hash: OLACOLP4VJDX67XPXXSWVXUBS7RWHHTP X-Message-ID-Hash: OLACOLP4VJDX67XPXXSWVXUBS7RWHHTP X-MailFrom: cpaasch@apple.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: mptcp@lists.01.org X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH net-next v3 04/19] mptcp: Handle MP_CAPABLE options for outgoing connections List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: From: Peter Krystad Add hooks to tcp_output.c to add MP_CAPABLE to an outgoing SYN request, to capture the MP_CAPABLE in the received SYN-ACK, to add MP_CAPABLE to the final ACK of the three-way handshake. Use the .sk_rx_dst_set() handler in the subflow proto to capture when the responding SYN-ACK is received and notify the MPTCP connection layer. Co-developed-by: Paolo Abeni Signed-off-by: Paolo Abeni Co-developed-by: Florian Westphal Signed-off-by: Florian Westphal Signed-off-by: Peter Krystad Signed-off-by: Christoph Paasch --- include/linux/tcp.h | 3 + include/net/mptcp.h | 57 +++++++++ net/ipv4/tcp_input.c | 6 + net/ipv4/tcp_output.c | 44 +++++++ net/ipv6/tcp_ipv6.c | 6 + net/mptcp/options.c | 100 ++++++++++++++++ net/mptcp/protocol.c | 163 +++++++++++++++++++++---- net/mptcp/protocol.h | 40 ++++++- net/mptcp/subflow.c | 268 +++++++++++++++++++++++++++++++++++++++++- 9 files changed, 663 insertions(+), 24 deletions(-) diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 877947475814..e9ee06d887fa 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -137,6 +137,9 @@ struct tcp_request_sock { const struct tcp_request_sock_ops *af_specific; u64 snt_synack; /* first SYNACK sent time */ bool tfo_listener; +#if IS_ENABLED(CONFIG_MPTCP) + bool is_mptcp; +#endif u32 txhash; u32 rcv_isn; u32 snt_isn; diff --git a/include/net/mptcp.h b/include/net/mptcp.h index 3daec2ceb3ff..eabc57c3fde4 100644 --- a/include/net/mptcp.h +++ b/include/net/mptcp.h @@ -39,8 +39,27 @@ struct mptcp_out_options { void mptcp_init(void); +static inline bool sk_is_mptcp(const struct sock *sk) +{ + return tcp_sk(sk)->is_mptcp; +} + +static inline bool rsk_is_mptcp(const struct request_sock *req) +{ + return tcp_rsk(req)->is_mptcp; +} + void mptcp_parse_option(const unsigned char *ptr, int opsize, struct tcp_options_received *opt_rx); +bool mptcp_syn_options(struct sock *sk, unsigned int *size, + struct mptcp_out_options *opts); +void mptcp_rcv_synsent(struct sock *sk); +bool mptcp_synack_options(const struct request_sock *req, unsigned int *size, + struct mptcp_out_options *opts); +bool mptcp_established_options(struct sock *sk, struct sk_buff *skb, + unsigned int *size, unsigned int remaining, + struct mptcp_out_options *opts); + void mptcp_write_options(__be32 *ptr, struct mptcp_out_options *opts); /* move the skb extension owership, with the assumption that 'to' is @@ -89,11 +108,47 @@ static inline void mptcp_init(void) { } +static inline bool sk_is_mptcp(const struct sock *sk) +{ + return false; +} + +static inline bool rsk_is_mptcp(const struct request_sock *req) +{ + return false; +} + static inline void mptcp_parse_option(const unsigned char *ptr, int opsize, struct tcp_options_received *opt_rx) { } +static inline bool mptcp_syn_options(struct sock *sk, unsigned int *size, + struct mptcp_out_options *opts) +{ + return false; +} + +static inline void mptcp_rcv_synsent(struct sock *sk) +{ +} + +static inline bool mptcp_synack_options(const struct request_sock *req, + unsigned int *size, + struct mptcp_out_options *opts) +{ + return false; +} + +static inline bool mptcp_established_options(struct sock *sk, + struct sk_buff *skb, + unsigned int *size, + unsigned int remaining, + struct mptcp_out_options *opts) +{ + return false; +} + static inline void mptcp_skb_ext_move(struct sk_buff *to, const struct sk_buff *from) { @@ -107,6 +162,8 @@ static inline bool mptcp_skb_can_collapse(const struct sk_buff *to, #endif /* CONFIG_MPTCP */ +void mptcp_handle_ipv6_mapped(struct sock *sk, bool mapped); + #if IS_ENABLED(CONFIG_MPTCP_IPV6) int mptcpv6_init(void); #elif IS_ENABLED(CONFIG_IPV6) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 3458ee13e6f0..5165c8de47ee 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -5978,6 +5978,9 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb, tcp_sync_mss(sk, icsk->icsk_pmtu_cookie); tcp_initialize_rcv_mss(sk); + if (sk_is_mptcp(sk)) + mptcp_rcv_synsent(sk); + /* Remember, tcp_poll() does not lock socket! * Change state from SYN-SENT only after copied_seq * is initialized. */ @@ -6600,6 +6603,9 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops, tcp_rsk(req)->af_specific = af_ops; tcp_rsk(req)->ts_off = 0; +#if IS_ENABLED(CONFIG_MPTCP) + tcp_rsk(req)->is_mptcp = 0; +#endif tcp_clear_options(&tmp_opt); tmp_opt.mss_clamp = af_ops->mss_clamp; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index b6cf87b02b01..c6797a388921 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -597,6 +597,22 @@ static void smc_set_option_cond(const struct tcp_sock *tp, #endif } +static void mptcp_set_option_cond(const struct request_sock *req, + struct tcp_out_options *opts, + unsigned int *remaining) +{ + if (rsk_is_mptcp(req)) { + unsigned int size; + + if (mptcp_synack_options(req, &size, &opts->mptcp)) { + if (*remaining >= size) { + opts->options |= OPTION_MPTCP; + *remaining -= size; + } + } + } +} + /* Compute TCP options for SYN packets. This is not the final * network wire format yet. */ @@ -666,6 +682,15 @@ static unsigned int tcp_syn_options(struct sock *sk, struct sk_buff *skb, smc_set_option(tp, opts, &remaining); + if (sk_is_mptcp(sk)) { + unsigned int size; + + if (mptcp_syn_options(sk, &size, &opts->mptcp)) { + opts->options |= OPTION_MPTCP; + remaining -= size; + } + } + return MAX_TCP_OPTION_SPACE - remaining; } @@ -727,6 +752,8 @@ static unsigned int tcp_synack_options(const struct sock *sk, } } + mptcp_set_option_cond(req, opts, &remaining); + smc_set_option_cond(tcp_sk(sk), ireq, opts, &remaining); return MAX_TCP_OPTION_SPACE - remaining; @@ -764,6 +791,23 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb size += TCPOLEN_TSTAMP_ALIGNED; } + /* MPTCP options have precedence over SACK for the limited TCP + * option space because a MPTCP connection would be forced to + * fall back to regular TCP if a required multipath option is + * missing. SACK still gets a chance to use whatever space is + * left. + */ + if (sk_is_mptcp(sk)) { + unsigned int remaining = MAX_TCP_OPTION_SPACE - size; + unsigned int opt_size = 0; + + if (mptcp_established_options(sk, skb, &opt_size, remaining, + &opts->mptcp)) { + opts->options |= OPTION_MPTCP; + size += opt_size; + } + } + eff_sacks = tp->rx_opt.num_sacks + tp->rx_opt.dsack; if (unlikely(eff_sacks)) { const unsigned int remaining = MAX_TCP_OPTION_SPACE - size; diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 60068ffde1d9..33a578a3eb3a 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -238,6 +238,8 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr, sin.sin_addr.s_addr = usin->sin6_addr.s6_addr32[3]; icsk->icsk_af_ops = &ipv6_mapped; + if (sk_is_mptcp(sk)) + mptcp_handle_ipv6_mapped(sk, true); sk->sk_backlog_rcv = tcp_v4_do_rcv; #ifdef CONFIG_TCP_MD5SIG tp->af_specific = &tcp_sock_ipv6_mapped_specific; @@ -248,6 +250,8 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr, if (err) { icsk->icsk_ext_hdr_len = exthdrlen; icsk->icsk_af_ops = &ipv6_specific; + if (sk_is_mptcp(sk)) + mptcp_handle_ipv6_mapped(sk, false); sk->sk_backlog_rcv = tcp_v6_do_rcv; #ifdef CONFIG_TCP_MD5SIG tp->af_specific = &tcp_sock_ipv6_specific; @@ -1203,6 +1207,8 @@ static struct sock *tcp_v6_syn_recv_sock(const struct sock *sk, struct sk_buff * newnp->saddr = newsk->sk_v6_rcv_saddr; inet_csk(newsk)->icsk_af_ops = &ipv6_mapped; + if (sk_is_mptcp(newsk)) + mptcp_handle_ipv6_mapped(newsk, true); newsk->sk_backlog_rcv = tcp_v4_do_rcv; #ifdef CONFIG_TCP_MD5SIG newtp->af_specific = &tcp_sock_ipv6_mapped_specific; diff --git a/net/mptcp/options.c b/net/mptcp/options.c index b7a31c0e5283..52ff2301b68b 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -72,14 +72,114 @@ void mptcp_parse_option(const unsigned char *ptr, int opsize, } } +void mptcp_get_options(const struct sk_buff *skb, + struct tcp_options_received *opt_rx) +{ + const unsigned char *ptr; + const struct tcphdr *th = tcp_hdr(skb); + int length = (th->doff * 4) - sizeof(struct tcphdr); + + ptr = (const unsigned char *)(th + 1); + + while (length > 0) { + int opcode = *ptr++; + int opsize; + + switch (opcode) { + case TCPOPT_EOL: + return; + case TCPOPT_NOP: /* Ref: RFC 793 section 3.1 */ + length--; + continue; + default: + opsize = *ptr++; + if (opsize < 2) /* "silly options" */ + return; + if (opsize > length) + return; /* don't parse partial options */ + if (opcode == TCPOPT_MPTCP) + mptcp_parse_option(ptr, opsize, opt_rx); + ptr += opsize - 2; + length -= opsize; + } + } +} + +bool mptcp_syn_options(struct sock *sk, unsigned int *size, + struct mptcp_out_options *opts) +{ + struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk); + + if (subflow->request_mptcp) { + pr_debug("local_key=%llu", subflow->local_key); + opts->suboptions = OPTION_MPTCP_MPC_SYN; + opts->sndr_key = subflow->local_key; + *size = TCPOLEN_MPTCP_MPC_SYN; + return true; + } + return false; +} + +void mptcp_rcv_synsent(struct sock *sk) +{ + struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk); + struct tcp_sock *tp = tcp_sk(sk); + + pr_debug("subflow=%p", subflow); + if (subflow->request_mptcp && tp->rx_opt.mptcp.mp_capable) { + subflow->mp_capable = 1; + subflow->remote_key = tp->rx_opt.mptcp.sndr_key; + } else { + tcp_sk(sk)->is_mptcp = 0; + } +} + +bool mptcp_established_options(struct sock *sk, struct sk_buff *skb, + unsigned int *size, unsigned int remaining, + struct mptcp_out_options *opts) +{ + struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk); + + if (subflow->mp_capable && !subflow->fourth_ack) { + opts->suboptions = OPTION_MPTCP_MPC_ACK; + opts->sndr_key = subflow->local_key; + opts->rcvr_key = subflow->remote_key; + *size = TCPOLEN_MPTCP_MPC_ACK; + subflow->fourth_ack = 1; + pr_debug("subflow=%p, local_key=%llu, remote_key=%llu", + subflow, subflow->local_key, subflow->remote_key); + return true; + } + return false; +} + +bool mptcp_synack_options(const struct request_sock *req, unsigned int *size, + struct mptcp_out_options *opts) +{ + struct mptcp_subflow_request_sock *subflow_req = mptcp_subflow_rsk(req); + + if (subflow_req->mp_capable) { + opts->suboptions = OPTION_MPTCP_MPC_SYNACK; + opts->sndr_key = subflow_req->local_key; + *size = TCPOLEN_MPTCP_MPC_SYNACK; + pr_debug("subflow_req=%p, local_key=%llu", + subflow_req, subflow_req->local_key); + return true; + } + return false; +} + void mptcp_write_options(__be32 *ptr, struct mptcp_out_options *opts) { if ((OPTION_MPTCP_MPC_SYN | + OPTION_MPTCP_MPC_SYNACK | OPTION_MPTCP_MPC_ACK) & opts->suboptions) { u8 len; if (OPTION_MPTCP_MPC_SYN & opts->suboptions) len = TCPOLEN_MPTCP_MPC_SYN; + else if (OPTION_MPTCP_MPC_SYNACK & opts->suboptions) + len = TCPOLEN_MPTCP_MPC_SYNACK; else len = TCPOLEN_MPTCP_MPC_ACK; diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 294b03a0393a..bdd58da1e4f6 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -25,12 +25,28 @@ */ static struct socket *__mptcp_nmpc_socket(const struct mptcp_sock *msk) { - if (!msk->subflow) + if (!msk->subflow || mptcp_subflow_ctx(msk->subflow->sk)->fourth_ack) return NULL; return msk->subflow; } +/* if msk has a single subflow, and the mp_capable handshake is failed, + * return it. + * Otherwise returns NULL + */ +static struct socket *__mptcp_tcp_fallback(const struct mptcp_sock *msk) +{ + struct socket *ssock = __mptcp_nmpc_socket(msk); + + sock_owned_by_me((const struct sock *)msk); + + if (!ssock || sk_is_mptcp(ssock->sk)) + return NULL; + + return ssock; +} + static bool __mptcp_can_create_subflow(const struct mptcp_sock *msk) { return ((struct sock *)msk)->sk_state == TCP_CLOSE; @@ -56,6 +72,7 @@ static struct socket *__mptcp_socket_create(struct mptcp_sock *msk, int state) msk->subflow = ssock; subflow = mptcp_subflow_ctx(ssock->sk); + list_add(&subflow->node, &msk->conn_list); subflow->request_mptcp = 1; set_state: @@ -64,66 +81,169 @@ static struct socket *__mptcp_socket_create(struct mptcp_sock *msk, int state) return ssock; } +static struct sock *mptcp_subflow_get(const struct mptcp_sock *msk) +{ + struct mptcp_subflow_context *subflow; + + sock_owned_by_me((const struct sock *)msk); + + mptcp_for_each_subflow(msk, subflow) { + return mptcp_subflow_tcp_sock(subflow); + } + + return NULL; +} + static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) { struct mptcp_sock *msk = mptcp_sk(sk); - struct socket *subflow = msk->subflow; + struct socket *ssock; + struct sock *ssk; + int ret; if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL)) return -EOPNOTSUPP; - return sock_sendmsg(subflow, msg); + lock_sock(sk); + ssock = __mptcp_tcp_fallback(msk); + if (ssock) { + pr_debug("fallback passthrough"); + ret = sock_sendmsg(ssock, msg); + release_sock(sk); + return ret; + } + + ssk = mptcp_subflow_get(msk); + if (!ssk) { + release_sock(sk); + return -ENOTCONN; + } + + ret = sock_sendmsg(ssk->sk_socket, msg); + + release_sock(sk); + return ret; } static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock, int flags, int *addr_len) { struct mptcp_sock *msk = mptcp_sk(sk); - struct socket *subflow = msk->subflow; + struct socket *ssock; + struct sock *ssk; + int copied = 0; if (msg->msg_flags & ~(MSG_WAITALL | MSG_DONTWAIT)) return -EOPNOTSUPP; - return sock_recvmsg(subflow, msg, flags); + lock_sock(sk); + ssock = __mptcp_tcp_fallback(msk); + if (ssock) { + pr_debug("fallback-read subflow=%p", + mptcp_subflow_ctx(ssock->sk)); + copied = sock_recvmsg(ssock, msg, flags); + release_sock(sk); + return copied; + } + + ssk = mptcp_subflow_get(msk); + if (!ssk) { + release_sock(sk); + return -ENOTCONN; + } + + copied = sock_recvmsg(ssk->sk_socket, msg, flags); + + release_sock(sk); + + return copied; +} + +/* subflow sockets can be either outgoing (connect) or incoming + * (accept). + * + * Outgoing subflows use in-kernel sockets. + * Incoming subflows do not have their own 'struct socket' allocated, + * so we need to use tcp_close() after detaching them from the mptcp + * parent socket. + */ +static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk, + struct mptcp_subflow_context *subflow, + long timeout) +{ + struct socket *sock = READ_ONCE(ssk->sk_socket); + + list_del(&subflow->node); + + if (sock && sock != sk->sk_socket) { + /* outgoing subflow */ + sock_release(sock); + } else { + /* incoming subflow */ + tcp_close(ssk, timeout); + } } static int mptcp_init_sock(struct sock *sk) { + struct mptcp_sock *msk = mptcp_sk(sk); + + INIT_LIST_HEAD(&msk->conn_list); + return 0; } static void mptcp_close(struct sock *sk, long timeout) { + struct mptcp_subflow_context *subflow, *tmp; struct mptcp_sock *msk = mptcp_sk(sk); - struct socket *ssock; inet_sk_state_store(sk, TCP_CLOSE); - ssock = __mptcp_nmpc_socket(msk); - if (ssock) { - pr_debug("subflow=%p", mptcp_subflow_ctx(ssock->sk)); - sock_release(ssock); + lock_sock(sk); + + list_for_each_entry_safe(subflow, tmp, &msk->conn_list, node) { + struct sock *ssk = mptcp_subflow_tcp_sock(subflow); + + __mptcp_close_ssk(sk, ssk, subflow, timeout); } - sock_orphan(sk); - sock_put(sk); + release_sock(sk); + sk_common_release(sk); } -static int mptcp_connect(struct sock *sk, struct sockaddr *saddr, int len) +static int mptcp_get_port(struct sock *sk, unsigned short snum) { struct mptcp_sock *msk = mptcp_sk(sk); - int err; + struct socket *ssock; - saddr->sa_family = AF_INET; + ssock = __mptcp_nmpc_socket(msk); + pr_debug("msk=%p, subflow=%p", msk, ssock); + if (WARN_ON_ONCE(!ssock)) + return -EINVAL; - pr_debug("msk=%p, subflow=%p", msk, - mptcp_subflow_ctx(msk->subflow->sk)); + return inet_csk_get_port(ssock->sk, snum); +} - err = kernel_connect(msk->subflow, saddr, len, 0); +void mptcp_finish_connect(struct sock *ssk) +{ + struct mptcp_subflow_context *subflow; + struct mptcp_sock *msk; + struct sock *sk; - sk->sk_state = TCP_ESTABLISHED; + subflow = mptcp_subflow_ctx(ssk); - return err; + if (!subflow->mp_capable) + return; + + sk = subflow->conn; + msk = mptcp_sk(sk); + + /* the socket is not connected yet, no msk/subflow ops can access/race + * accessing the field below + */ + WRITE_ONCE(msk->remote_key, subflow->remote_key); + WRITE_ONCE(msk->local_key, subflow->local_key); } static struct proto mptcp_prot = { @@ -132,13 +252,12 @@ static struct proto mptcp_prot = { .init = mptcp_init_sock, .close = mptcp_close, .accept = inet_csk_accept, - .connect = mptcp_connect, .shutdown = tcp_shutdown, .sendmsg = mptcp_sendmsg, .recvmsg = mptcp_recvmsg, .hash = inet_hash, .unhash = inet_unhash, - .get_port = inet_csk_get_port, + .get_port = mptcp_get_port, .obj_size = sizeof(struct mptcp_sock), .no_autobind = true, }; diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 543d4d5d8985..bd66e7415515 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -40,19 +40,47 @@ struct mptcp_sock { /* inet_connection_sock must be the first member */ struct inet_connection_sock sk; + u64 local_key; + u64 remote_key; + struct list_head conn_list; struct socket *subflow; /* outgoing connect/listener/!mp_capable */ }; +#define mptcp_for_each_subflow(__msk, __subflow) \ + list_for_each_entry(__subflow, &((__msk)->conn_list), node) + static inline struct mptcp_sock *mptcp_sk(const struct sock *sk) { return (struct mptcp_sock *)sk; } +struct mptcp_subflow_request_sock { + struct tcp_request_sock sk; + u8 mp_capable : 1, + mp_join : 1, + backup : 1; + u64 local_key; + u64 remote_key; +}; + +static inline struct mptcp_subflow_request_sock * +mptcp_subflow_rsk(const struct request_sock *rsk) +{ + return (struct mptcp_subflow_request_sock *)rsk; +} + /* MPTCP subflow context */ struct mptcp_subflow_context { - u32 request_mptcp : 1; /* send MP_CAPABLE */ + struct list_head node;/* conn_list of subflows */ + u64 local_key; + u64 remote_key; + u32 request_mptcp : 1, /* send MP_CAPABLE */ + mp_capable : 1, /* remote is MPTCP capable */ + fourth_ack : 1, /* send initial DSS */ + conn_finished : 1; struct sock *tcp_sock; /* tcp sk backpointer */ struct sock *conn; /* parent mptcp_sock */ + const struct inet_connection_sock_af_ops *icsk_af_ops; struct rcu_head rcu; }; @@ -74,4 +102,14 @@ mptcp_subflow_tcp_sock(const struct mptcp_subflow_context *subflow) void mptcp_subflow_init(void); int mptcp_subflow_create_socket(struct sock *sk, struct socket **new_sock); +extern const struct inet_connection_sock_af_ops ipv4_specific; +#if IS_ENABLED(CONFIG_MPTCP_IPV6) +extern const struct inet_connection_sock_af_ops ipv6_specific; +#endif + +void mptcp_get_options(const struct sk_buff *skb, + struct tcp_options_received *opt_rx); + +void mptcp_finish_connect(struct sock *sk); + #endif /* __MPTCP_PROTOCOL_H */ diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index bf8139353653..df3192305967 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -12,9 +12,188 @@ #include #include #include +#if IS_ENABLED(CONFIG_MPTCP_IPV6) +#include +#endif #include #include "protocol.h" +static void subflow_init_req(struct request_sock *req, + const struct sock *sk_listener, + struct sk_buff *skb) +{ + struct mptcp_subflow_context *listener = mptcp_subflow_ctx(sk_listener); + struct mptcp_subflow_request_sock *subflow_req = mptcp_subflow_rsk(req); + struct tcp_options_received rx_opt; + + pr_debug("subflow_req=%p, listener=%p", subflow_req, listener); + + memset(&rx_opt.mptcp, 0, sizeof(rx_opt.mptcp)); + mptcp_get_options(skb, &rx_opt); + + subflow_req->mp_capable = 0; + +#ifdef CONFIG_TCP_MD5SIG + /* no MPTCP if MD5SIG is enabled on this socket or we may run out of + * TCP option space. + */ + if (rcu_access_pointer(tcp_sk(sk_listener)->md5sig_info)) + return; +#endif + + if (rx_opt.mptcp.mp_capable && listener->request_mptcp) { + subflow_req->mp_capable = 1; + subflow_req->remote_key = rx_opt.mptcp.sndr_key; + } +} + +static void subflow_v4_init_req(struct request_sock *req, + const struct sock *sk_listener, + struct sk_buff *skb) +{ + tcp_rsk(req)->is_mptcp = 1; + + tcp_request_sock_ipv4_ops.init_req(req, sk_listener, skb); + + subflow_init_req(req, sk_listener, skb); +} + +#if IS_ENABLED(CONFIG_MPTCP_IPV6) +static void subflow_v6_init_req(struct request_sock *req, + const struct sock *sk_listener, + struct sk_buff *skb) +{ + tcp_rsk(req)->is_mptcp = 1; + + tcp_request_sock_ipv6_ops.init_req(req, sk_listener, skb); + + subflow_init_req(req, sk_listener, skb); +} +#endif + +static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb) +{ + struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk); + + subflow->icsk_af_ops->sk_rx_dst_set(sk, skb); + + if (subflow->conn && !subflow->conn_finished) { + pr_debug("subflow=%p, remote_key=%llu", mptcp_subflow_ctx(sk), + subflow->remote_key); + mptcp_finish_connect(sk); + subflow->conn_finished = 1; + } +} + +static struct request_sock_ops subflow_request_sock_ops; +static struct tcp_request_sock_ops subflow_request_sock_ipv4_ops; + +static int subflow_v4_conn_request(struct sock *sk, struct sk_buff *skb) +{ + struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk); + + pr_debug("subflow=%p", subflow); + + /* Never answer to SYNs sent to broadcast or multicast */ + if (skb_rtable(skb)->rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST)) + goto drop; + + return tcp_conn_request(&subflow_request_sock_ops, + &subflow_request_sock_ipv4_ops, + sk, skb); +drop: + tcp_listendrop(sk); + return 0; +} + +#if IS_ENABLED(CONFIG_MPTCP_IPV6) +static struct tcp_request_sock_ops subflow_request_sock_ipv6_ops; +static struct inet_connection_sock_af_ops subflow_v6_specific; +static struct inet_connection_sock_af_ops subflow_v6m_specific; + +static int subflow_v6_conn_request(struct sock *sk, struct sk_buff *skb) +{ + struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk); + + pr_debug("subflow=%p", subflow); + + if (skb->protocol == htons(ETH_P_IP)) + return subflow_v4_conn_request(sk, skb); + + if (!ipv6_unicast_destination(skb)) + goto drop; + + return tcp_conn_request(&subflow_request_sock_ops, + &subflow_request_sock_ipv6_ops, sk, skb); + +drop: + tcp_listendrop(sk); + return 0; /* don't send reset */ +} +#endif + +static struct sock *subflow_syn_recv_sock(const struct sock *sk, + struct sk_buff *skb, + struct request_sock *req, + struct dst_entry *dst, + struct request_sock *req_unhash, + bool *own_req) +{ + struct mptcp_subflow_context *listener = mptcp_subflow_ctx(sk); + struct sock *child; + + pr_debug("listener=%p, req=%p, conn=%p", listener, req, listener->conn); + + /* if the sk is MP_CAPABLE, we already received the client key */ + + child = listener->icsk_af_ops->syn_recv_sock(sk, skb, req, dst, + req_unhash, own_req); + + if (child && *own_req) { + if (!mptcp_subflow_ctx(child)) { + pr_debug("Closing child socket"); + inet_sk_set_state(child, TCP_CLOSE); + sock_set_flag(child, SOCK_DEAD); + inet_csk_destroy_sock(child); + child = NULL; + } + } + + return child; +} + +static struct inet_connection_sock_af_ops subflow_specific; + +static struct inet_connection_sock_af_ops * +subflow_default_af_ops(struct sock *sk) +{ +#if IS_ENABLED(CONFIG_MPTCP_IPV6) + if (sk->sk_family == AF_INET6) + return &subflow_v6_specific; +#endif + return &subflow_specific; +} + +void mptcp_handle_ipv6_mapped(struct sock *sk, bool mapped) +{ +#if IS_ENABLED(CONFIG_MPTCP_IPV6) + struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk); + struct inet_connection_sock *icsk = inet_csk(sk); + struct inet_connection_sock_af_ops *target; + + target = mapped ? &subflow_v6m_specific : subflow_default_af_ops(sk); + + pr_debug("subflow=%p family=%d ops=%p target=%p mapped=%d", + subflow, sk->sk_family, icsk->icsk_af_ops, target, mapped); + + if (likely(icsk->icsk_af_ops == target)) + return; + + subflow->icsk_af_ops = icsk->icsk_af_ops; + icsk->icsk_af_ops = target; +#endif +} + int mptcp_subflow_create_socket(struct sock *sk, struct socket **new_sock) { struct mptcp_subflow_context *subflow; @@ -22,7 +201,8 @@ int mptcp_subflow_create_socket(struct sock *sk, struct socket **new_sock) struct socket *sf; int err; - err = sock_create_kern(net, PF_INET, SOCK_STREAM, IPPROTO_TCP, &sf); + err = sock_create_kern(net, sk->sk_family, SOCK_STREAM, IPPROTO_TCP, + &sf); if (err) return err; @@ -60,6 +240,7 @@ static struct mptcp_subflow_context *subflow_create_ctx(struct sock *sk, return NULL; rcu_assign_pointer(icsk->icsk_ulp_data, ctx); + INIT_LIST_HEAD(&ctx->node); pr_debug("subflow=%p", ctx); @@ -70,6 +251,7 @@ static struct mptcp_subflow_context *subflow_create_ctx(struct sock *sk, static int subflow_ulp_init(struct sock *sk) { + struct inet_connection_sock *icsk = inet_csk(sk); struct mptcp_subflow_context *ctx; struct tcp_sock *tp = tcp_sk(sk); int err = 0; @@ -91,6 +273,8 @@ static int subflow_ulp_init(struct sock *sk) pr_debug("subflow=%p, family=%d", ctx, sk->sk_family); tp->is_mptcp = 1; + ctx->icsk_af_ops = icsk->icsk_af_ops; + icsk->icsk_af_ops = subflow_default_af_ops(sk); out: return err; } @@ -105,15 +289,97 @@ static void subflow_ulp_release(struct sock *sk) kfree_rcu(ctx, rcu); } +static void subflow_ulp_fallback(struct sock *sk) +{ + struct inet_connection_sock *icsk = inet_csk(sk); + + icsk->icsk_ulp_ops = NULL; + rcu_assign_pointer(icsk->icsk_ulp_data, NULL); + tcp_sk(sk)->is_mptcp = 0; +} + +static void subflow_ulp_clone(const struct request_sock *req, + struct sock *newsk, + const gfp_t priority) +{ + struct mptcp_subflow_request_sock *subflow_req = mptcp_subflow_rsk(req); + struct mptcp_subflow_context *old_ctx = mptcp_subflow_ctx(newsk); + struct mptcp_subflow_context *new_ctx; + + if (!subflow_req->mp_capable) { + subflow_ulp_fallback(newsk); + return; + } + + new_ctx = subflow_create_ctx(newsk, priority); + if (new_ctx == NULL) { + subflow_ulp_fallback(newsk); + return; + } + + new_ctx->conn_finished = 1; + new_ctx->icsk_af_ops = old_ctx->icsk_af_ops; + new_ctx->mp_capable = 1; + new_ctx->fourth_ack = 1; + new_ctx->remote_key = subflow_req->remote_key; + new_ctx->local_key = subflow_req->local_key; +} + static struct tcp_ulp_ops subflow_ulp_ops __read_mostly = { .name = "mptcp", .owner = THIS_MODULE, .init = subflow_ulp_init, .release = subflow_ulp_release, + .clone = subflow_ulp_clone, }; +static int subflow_ops_init(struct request_sock_ops *subflow_ops) +{ + subflow_ops->obj_size = sizeof(struct mptcp_subflow_request_sock); + subflow_ops->slab_name = "request_sock_subflow"; + + subflow_ops->slab = kmem_cache_create(subflow_ops->slab_name, + subflow_ops->obj_size, 0, + SLAB_ACCOUNT | + SLAB_TYPESAFE_BY_RCU, + NULL); + if (!subflow_ops->slab) + return -ENOMEM; + + return 0; +} + void mptcp_subflow_init(void) { + subflow_request_sock_ops = tcp_request_sock_ops; + if (subflow_ops_init(&subflow_request_sock_ops) != 0) + panic("MPTCP: failed to init subflow request sock ops\n"); + + subflow_request_sock_ipv4_ops = tcp_request_sock_ipv4_ops; + subflow_request_sock_ipv4_ops.init_req = subflow_v4_init_req; + + subflow_specific = ipv4_specific; + subflow_specific.conn_request = subflow_v4_conn_request; + subflow_specific.syn_recv_sock = subflow_syn_recv_sock; + subflow_specific.sk_rx_dst_set = subflow_finish_connect; + +#if IS_ENABLED(CONFIG_MPTCP_IPV6) + subflow_request_sock_ipv6_ops = tcp_request_sock_ipv6_ops; + subflow_request_sock_ipv6_ops.init_req = subflow_v6_init_req; + + subflow_v6_specific = ipv6_specific; + subflow_v6_specific.conn_request = subflow_v6_conn_request; + subflow_v6_specific.syn_recv_sock = subflow_syn_recv_sock; + subflow_v6_specific.sk_rx_dst_set = subflow_finish_connect; + + subflow_v6m_specific = subflow_v6_specific; + subflow_v6m_specific.queue_xmit = ipv4_specific.queue_xmit; + subflow_v6m_specific.send_check = ipv4_specific.send_check; + subflow_v6m_specific.net_header_len = ipv4_specific.net_header_len; + subflow_v6m_specific.mtu_reduced = ipv4_specific.mtu_reduced; + subflow_v6m_specific.net_frag_header_len = 0; +#endif + if (tcp_register_ulp(&subflow_ulp_ops) != 0) panic("MPTCP: failed to register subflows to ULP\n"); } From patchwork Wed Jan 22 00:56:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Paasch X-Patchwork-Id: 1226851 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=198.145.21.10; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=quarantine dis=none) header.from=apple.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=apple.com header.i=@apple.com header.a=rsa-sha256 header.s=20180706 header.b=mOoytJQh; dkim-atps=neutral Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 482Rnb72XCz9sRR for ; Wed, 22 Jan 2020 11:57:07 +1100 (AEDT) Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id CEFF21007B8DB; Tue, 21 Jan 2020 17:00:23 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=17.151.62.67; helo=nwk-aaemail-lapp02.apple.com; envelope-from=cpaasch@apple.com; receiver= Received: from nwk-aaemail-lapp02.apple.com (nwk-aaemail-lapp02.apple.com [17.151.62.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id F09781007B8C8 for ; Tue, 21 Jan 2020 17:00:21 -0800 (PST) Received: from pps.filterd (nwk-aaemail-lapp02.apple.com [127.0.0.1]) by nwk-aaemail-lapp02.apple.com (8.16.0.27/8.16.0.27) with SMTP id 00M0v2eF020009; Tue, 21 Jan 2020 16:57:03 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=apple.com; h=sender : from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=20180706; bh=TsTeoBmuWyStiALMY2D/lGyttqvbvt4CgCxbwE6mUg0=; b=mOoytJQhy0kaJubL1C6Iykb+iidDsrCLRFrd3HgbR2A+b0MR8BPNhmFzNcGoVEzfWD3Y +xRSSVkDNo7j+0eAZzBwarGLV9ejmN5/3nv5QPFBr51JK6WwbN9k4w6YVkqQh4a/arR8 Maw69jh8jTpUqsT/QHIp0nIrEeoOVBcrlk9RdW4h5HOofb25Dn7ToSrQlNxMX4I1vUHr odpCnqWh3GPLjc2YahyDHPVnVGc1KvhGqLDq9rFoIJTWyE4fn+OfXubH+1xeL0jAxTgy A71mZTVbhR9Ogn6TxZOs/Qwmrxlm5Uwdd7YOliOawNM7c1PaSELyuQM9HQMwwuKDEDOu 1g== Received: from ma1-mtap-s03.corp.apple.com (ma1-mtap-s03.corp.apple.com [17.40.76.7]) by nwk-aaemail-lapp02.apple.com with ESMTP id 2xkyfq8e4a-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Tue, 21 Jan 2020 16:57:03 -0800 Received: from nwk-mmpp-sz13.apple.com (nwk-mmpp-sz13.apple.com [17.128.115.216]) by ma1-mtap-s03.corp.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPS id <0Q4H00064HAZD600@ma1-mtap-s03.corp.apple.com>; Tue, 21 Jan 2020 16:57:02 -0800 (PST) Received: from process_milters-daemon.nwk-mmpp-sz13.apple.com by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) id <0Q4H00100GR2PR00@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:57:00 -0800 (PST) X-Va-A: X-Va-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-Va-E-CD: c3a95c48df57998eb91288aafc2efb36 X-Va-R-CD: baea9a46c402a2695110c874d5435c96 X-Va-CD: 0 X-Va-ID: ee7c28f0-e8e4-458f-9b26-765dd371cbc6 X-V-A: X-V-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-V-E-CD: c3a95c48df57998eb91288aafc2efb36 X-V-R-CD: baea9a46c402a2695110c874d5435c96 X-V-CD: 0 X-V-ID: 257ac40d-f07b-49cb-810e-5aa818a2b79c X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2020-01-17_05:,, signatures=0 Received: from localhost ([17.192.155.241]) by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPSA id <0Q4H0011VHAZ4Y50@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:56:59 -0800 (PST) Sender: cpaasch@apple.com From: Christoph Paasch To: netdev@vger.kernel.org Date: Tue, 21 Jan 2020 16:56:19 -0800 Message-id: <20200122005633.21229-6-cpaasch@apple.com> X-Mailer: git-send-email 2.23.0 In-reply-to: <20200122005633.21229-1-cpaasch@apple.com> References: <20200122005633.21229-1-cpaasch@apple.com> MIME-version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2020-01-17_05:, , signatures=0 Message-ID-Hash: PEDLCEFCIGMEXHOWUVVWGSNV42KDQDWS X-Message-ID-Hash: PEDLCEFCIGMEXHOWUVVWGSNV42KDQDWS X-MailFrom: cpaasch@apple.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: mptcp@lists.01.org X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH net-next v3 05/19] mptcp: Create SUBFLOW socket for incoming connections List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: From: Peter Krystad Add subflow_request_sock type that extends tcp_request_sock and add an is_mptcp flag to tcp_request_sock distinguish them. Override the listen() and accept() methods of the MPTCP socket proto_ops so they may act on the subflow socket. Override the conn_request() and syn_recv_sock() handlers in the inet_connection_sock to handle incoming MPTCP SYNs and the ACK to the response SYN. Add handling in tcp_output.c to add MP_CAPABLE to an outgoing SYN-ACK response for a subflow_request_sock. Co-developed-by: Davide Caratti Signed-off-by: Davide Caratti Co-developed-by: Florian Westphal Signed-off-by: Florian Westphal Co-developed-by: Matthieu Baerts Signed-off-by: Matthieu Baerts Co-developed-by: Paolo Abeni Signed-off-by: Paolo Abeni Signed-off-by: Peter Krystad Signed-off-by: Christoph Paasch --- net/mptcp/protocol.c | 236 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 231 insertions(+), 5 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index bdd58da1e4f6..e08a25eabcd5 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -14,6 +14,9 @@ #include #include #include +#if IS_ENABLED(CONFIG_MPTCP_IPV6) +#include +#endif #include #include "protocol.h" @@ -212,6 +215,90 @@ static void mptcp_close(struct sock *sk, long timeout) sk_common_release(sk); } +static void mptcp_copy_inaddrs(struct sock *msk, const struct sock *ssk) +{ +#if IS_ENABLED(CONFIG_MPTCP_IPV6) + const struct ipv6_pinfo *ssk6 = inet6_sk(ssk); + struct ipv6_pinfo *msk6 = inet6_sk(msk); + + msk->sk_v6_daddr = ssk->sk_v6_daddr; + msk->sk_v6_rcv_saddr = ssk->sk_v6_rcv_saddr; + + if (msk6 && ssk6) { + msk6->saddr = ssk6->saddr; + msk6->flow_label = ssk6->flow_label; + } +#endif + + inet_sk(msk)->inet_num = inet_sk(ssk)->inet_num; + inet_sk(msk)->inet_dport = inet_sk(ssk)->inet_dport; + inet_sk(msk)->inet_sport = inet_sk(ssk)->inet_sport; + inet_sk(msk)->inet_daddr = inet_sk(ssk)->inet_daddr; + inet_sk(msk)->inet_saddr = inet_sk(ssk)->inet_saddr; + inet_sk(msk)->inet_rcv_saddr = inet_sk(ssk)->inet_rcv_saddr; +} + +static struct sock *mptcp_accept(struct sock *sk, int flags, int *err, + bool kern) +{ + struct mptcp_sock *msk = mptcp_sk(sk); + struct socket *listener; + struct sock *newsk; + + listener = __mptcp_nmpc_socket(msk); + if (WARN_ON_ONCE(!listener)) { + *err = -EINVAL; + return NULL; + } + + pr_debug("msk=%p, listener=%p", msk, mptcp_subflow_ctx(listener->sk)); + newsk = inet_csk_accept(listener->sk, flags, err, kern); + if (!newsk) + return NULL; + + pr_debug("msk=%p, subflow is mptcp=%d", msk, sk_is_mptcp(newsk)); + + if (sk_is_mptcp(newsk)) { + struct mptcp_subflow_context *subflow; + struct sock *new_mptcp_sock; + struct sock *ssk = newsk; + + subflow = mptcp_subflow_ctx(newsk); + lock_sock(sk); + + local_bh_disable(); + new_mptcp_sock = sk_clone_lock(sk, GFP_ATOMIC); + if (!new_mptcp_sock) { + *err = -ENOBUFS; + local_bh_enable(); + release_sock(sk); + tcp_close(newsk, 0); + return NULL; + } + + mptcp_init_sock(new_mptcp_sock); + + msk = mptcp_sk(new_mptcp_sock); + msk->remote_key = subflow->remote_key; + msk->local_key = subflow->local_key; + msk->subflow = NULL; + + newsk = new_mptcp_sock; + mptcp_copy_inaddrs(newsk, ssk); + list_add(&subflow->node, &msk->conn_list); + + /* will be fully established at mptcp_stream_accept() + * completion. + */ + inet_sk_state_store(new_mptcp_sock, TCP_SYN_RECV); + bh_unlock_sock(new_mptcp_sock); + local_bh_enable(); + release_sock(sk); + } + + return newsk; +} + static int mptcp_get_port(struct sock *sk, unsigned short snum) { struct mptcp_sock *msk = mptcp_sk(sk); @@ -246,12 +333,21 @@ void mptcp_finish_connect(struct sock *ssk) WRITE_ONCE(msk->local_key, subflow->local_key); } +static void mptcp_sock_graft(struct sock *sk, struct socket *parent) +{ + write_lock_bh(&sk->sk_callback_lock); + rcu_assign_pointer(sk->sk_wq, &parent->wq); + sk_set_socket(sk, parent); + sk->sk_uid = SOCK_INODE(parent)->i_uid; + write_unlock_bh(&sk->sk_callback_lock); +} + static struct proto mptcp_prot = { .name = "MPTCP", .owner = THIS_MODULE, .init = mptcp_init_sock, .close = mptcp_close, - .accept = inet_csk_accept, + .accept = mptcp_accept, .shutdown = tcp_shutdown, .sendmsg = mptcp_sendmsg, .recvmsg = mptcp_recvmsg, @@ -266,10 +362,7 @@ static int mptcp_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) { struct mptcp_sock *msk = mptcp_sk(sock->sk); struct socket *ssock; - int err = -ENOTSUPP; - - if (uaddr->sa_family != AF_INET) // @@ allow only IPv4 for now - return err; + int err; lock_sock(sock->sk); ssock = __mptcp_socket_create(msk, MPTCP_SAME_STATE); @@ -279,6 +372,8 @@ static int mptcp_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) } err = ssock->ops->bind(ssock, uaddr, addr_len); + if (!err) + mptcp_copy_inaddrs(sock->sk, ssock->sk); unlock: release_sock(sock->sk); @@ -299,14 +394,139 @@ static int mptcp_stream_connect(struct socket *sock, struct sockaddr *uaddr, goto unlock; } +#ifdef CONFIG_TCP_MD5SIG + /* no MPTCP if MD5SIG is enabled on this socket or we may run out of + * TCP option space. + */ + if (rcu_access_pointer(tcp_sk(ssock->sk)->md5sig_info)) + mptcp_subflow_ctx(ssock->sk)->request_mptcp = 0; +#endif + err = ssock->ops->connect(ssock, uaddr, addr_len, flags); inet_sk_state_store(sock->sk, inet_sk_state_load(ssock->sk)); + mptcp_copy_inaddrs(sock->sk, ssock->sk); unlock: release_sock(sock->sk); return err; } +static int mptcp_v4_getname(struct socket *sock, struct sockaddr *uaddr, + int peer) +{ + if (sock->sk->sk_prot == &tcp_prot) { + /* we are being invoked from __sys_accept4, after + * mptcp_accept() has just accepted a non-mp-capable + * flow: sk is a tcp_sk, not an mptcp one. + * + * Hand the socket over to tcp so all further socket ops + * bypass mptcp. + */ + sock->ops = &inet_stream_ops; + } + + return inet_getname(sock, uaddr, peer); +} + +#if IS_ENABLED(CONFIG_MPTCP_IPV6) +static int mptcp_v6_getname(struct socket *sock, struct sockaddr *uaddr, + int peer) +{ + if (sock->sk->sk_prot == &tcpv6_prot) { + /* we are being invoked from __sys_accept4 after + * mptcp_accept() has accepted a non-mp-capable + * subflow: sk is a tcp_sk, not mptcp. + * + * Hand the socket over to tcp so all further + * socket ops bypass mptcp. + */ + sock->ops = &inet6_stream_ops; + } + + return inet6_getname(sock, uaddr, peer); +} +#endif + +static int mptcp_listen(struct socket *sock, int backlog) +{ + struct mptcp_sock *msk = mptcp_sk(sock->sk); + struct socket *ssock; + int err; + + pr_debug("msk=%p", msk); + + lock_sock(sock->sk); + ssock = __mptcp_socket_create(msk, TCP_LISTEN); + if (IS_ERR(ssock)) { + err = PTR_ERR(ssock); + goto unlock; + } + + err = ssock->ops->listen(ssock, backlog); + inet_sk_state_store(sock->sk, inet_sk_state_load(ssock->sk)); + if (!err) + mptcp_copy_inaddrs(sock->sk, ssock->sk); + +unlock: + release_sock(sock->sk); + return err; +} + +static bool is_tcp_proto(const struct proto *p) +{ +#if IS_ENABLED(CONFIG_MPTCP_IPV6) + return p == &tcp_prot || p == &tcpv6_prot; +#else + return p == &tcp_prot; +#endif +} + +static int mptcp_stream_accept(struct socket *sock, struct socket *newsock, + int flags, bool kern) +{ + struct mptcp_sock *msk = mptcp_sk(sock->sk); + struct socket *ssock; + int err; + + pr_debug("msk=%p", msk); + + lock_sock(sock->sk); + if (sock->sk->sk_state != TCP_LISTEN) + goto unlock_fail; + + ssock = __mptcp_nmpc_socket(msk); + if (!ssock) + goto unlock_fail; + + sock_hold(ssock->sk); + release_sock(sock->sk); + + err = ssock->ops->accept(sock, newsock, flags, kern); + if (err == 0 && !is_tcp_proto(newsock->sk->sk_prot)) { + struct mptcp_sock *msk = mptcp_sk(newsock->sk); + struct mptcp_subflow_context *subflow; + + /* set ssk->sk_socket of accept()ed flows to mptcp socket. + * This is needed so NOSPACE flag can be set from tcp stack. + */ + list_for_each_entry(subflow, &msk->conn_list, node) { + struct sock *ssk = mptcp_subflow_tcp_sock(subflow); + + if (!ssk->sk_socket) + mptcp_sock_graft(ssk, newsock); + } + + inet_sk_state_store(newsock->sk, TCP_ESTABLISHED); + } + + sock_put(ssock->sk); + return err; + +unlock_fail: + release_sock(sock->sk); + return -EINVAL; +} + static __poll_t mptcp_poll(struct file *file, struct socket *sock, struct poll_table_struct *wait) { @@ -332,6 +552,9 @@ void __init mptcp_init(void) mptcp_stream_ops.bind = mptcp_bind; mptcp_stream_ops.connect = mptcp_stream_connect; mptcp_stream_ops.poll = mptcp_poll; + mptcp_stream_ops.accept = mptcp_stream_accept; + mptcp_stream_ops.getname = mptcp_v4_getname; + mptcp_stream_ops.listen = mptcp_listen; mptcp_subflow_init(); @@ -371,6 +594,9 @@ int mptcpv6_init(void) mptcp_v6_stream_ops.bind = mptcp_bind; mptcp_v6_stream_ops.connect = mptcp_stream_connect; mptcp_v6_stream_ops.poll = mptcp_poll; + mptcp_v6_stream_ops.accept = mptcp_stream_accept; + mptcp_v6_stream_ops.getname = mptcp_v6_getname; + mptcp_v6_stream_ops.listen = mptcp_listen; err = inet6_register_protosw(&mptcp_v6_protosw); if (err) From patchwork Wed Jan 22 00:56:20 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Christoph Paasch X-Patchwork-Id: 1226869 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=198.145.21.10; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=quarantine dis=none) header.from=apple.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=apple.com header.i=@apple.com header.a=rsa-sha256 header.s=20180706 header.b=o+mWsLCW; dkim-atps=neutral Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 482Rpn0SKCz9sRk for ; Wed, 22 Jan 2020 11:58:09 +1100 (AEDT) Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id D41EE1007B8E5; Tue, 21 Jan 2020 17:01:25 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=17.151.62.68; helo=nwk-aaemail-lapp03.apple.com; envelope-from=cpaasch@apple.com; receiver= Received: from nwk-aaemail-lapp03.apple.com (nwk-aaemail-lapp03.apple.com [17.151.62.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 3FF631007B8C3 for ; Tue, 21 Jan 2020 17:01:24 -0800 (PST) Received: from pps.filterd (nwk-aaemail-lapp03.apple.com [127.0.0.1]) by nwk-aaemail-lapp03.apple.com (8.16.0.27/8.16.0.27) with SMTP id 00M0urnV013197; Tue, 21 Jan 2020 16:57:05 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=apple.com; h=mime-version : content-transfer-encoding : content-type : sender : from : to : cc : subject : date : message-id : in-reply-to : references; s=20180706; bh=3Ptb5LhGEm6ejujC94kElIb8Z//RsMWJgs6ThCk3eO0=; b=o+mWsLCWYo/9GC7zzUoVozkJPGY9ocop3vkmzDCzzo76m3Zzr4sXEYWcvyv3EzGv3ZF2 Wu6w5HCKkbO07teYQ2ue1m4hGpt7tUAVJFHoxbz7gnEPY6VkC23LooudzzzsmEQRUsXZ ywOMZeM/pnus39BPVBxNDf1oFABh4DXs0lZHyjgdA9ccwTBVT0mXcUALRG5D+lMKL0EY vJpb6jQMott8+RoZYvjP+dwe49l7uoE7RXOJNOvMOTLc2AaiFicS1WkcngdTnjPxjBce Pb8v5W8+VpfQU2a85S66XUb4ZSJc1Hd4EUCwpC8sBCul41NnVy+uhzO9tvRRbMyQtkeP 1w== Received: from ma1-mtap-s01.corp.apple.com (ma1-mtap-s01.corp.apple.com [17.40.76.5]) by nwk-aaemail-lapp03.apple.com with ESMTP id 2xmk4p1699-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Tue, 21 Jan 2020 16:57:05 -0800 Received: from nwk-mmpp-sz13.apple.com (nwk-mmpp-sz13.apple.com [17.128.115.216]) by ma1-mtap-s01.corp.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPS id <0Q4H00HWQHAUJ720@ma1-mtap-s01.corp.apple.com>; Tue, 21 Jan 2020 16:57:02 -0800 (PST) Received: from process_milters-daemon.nwk-mmpp-sz13.apple.com by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) id <0Q4H00100GR2PR00@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:57:00 -0800 (PST) X-Va-A: X-Va-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-Va-E-CD: 25f8c37576da6baeedc9284dc0dc2333 X-Va-R-CD: 16e1daedfa7ee51e002803f96974a4c5 X-Va-CD: 0 X-Va-ID: 4a0df259-c6d7-4a61-ab9d-52bcbd26ed74 X-V-A: X-V-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-V-E-CD: 25f8c37576da6baeedc9284dc0dc2333 X-V-R-CD: 16e1daedfa7ee51e002803f96974a4c5 X-V-CD: 0 X-V-ID: f92ed012-60a1-4759-86ca-3d0c79bf8a6f X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2020-01-17_05:,, signatures=0 MIME-version: 1.0 Received: from localhost ([17.192.155.241]) by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPSA id <0Q4H0011XHAZ4Y50@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:56:59 -0800 (PST) Sender: cpaasch@apple.com From: Christoph Paasch To: netdev@vger.kernel.org Date: Tue, 21 Jan 2020 16:56:20 -0800 Message-id: <20200122005633.21229-7-cpaasch@apple.com> X-Mailer: git-send-email 2.23.0 In-reply-to: <20200122005633.21229-1-cpaasch@apple.com> References: <20200122005633.21229-1-cpaasch@apple.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2020-01-17_05:, , signatures=0 Message-ID-Hash: ZU7ZDPMM26LEMK7U7AJAMPASIFWOB2UL X-Message-ID-Hash: ZU7ZDPMM26LEMK7U7AJAMPASIFWOB2UL X-MailFrom: cpaasch@apple.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: mptcp@lists.01.org X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH net-next v3 06/19] mptcp: Add key generation and token tree List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: From: Peter Krystad Generate the local keys, IDSN, and token when creating a new socket. Introduce the token tree to track all tokens in use using a radix tree with the MPTCP token itself as the index. Override the rebuild_header callback in inet_connection_sock_af_ops for creating the local key on a new outgoing connection. Override the init_req callback of tcp_request_sock_ops for creating the local key on a new incoming connection. Will be used to obtain the MPTCP parent socket to handle incoming joins. Co-developed-by: Davide Caratti Signed-off-by: Davide Caratti Co-developed-by: Florian Westphal Signed-off-by: Florian Westphal Co-developed-by: Paolo Abeni Signed-off-by: Paolo Abeni Signed-off-by: Peter Krystad Signed-off-by: Christoph Paasch --- net/mptcp/Makefile | 2 +- net/mptcp/crypto.c | 122 +++++++++++++++++++++++++++ net/mptcp/protocol.c | 16 ++++ net/mptcp/protocol.h | 32 +++++++ net/mptcp/subflow.c | 69 +++++++++++++-- net/mptcp/token.c | 195 +++++++++++++++++++++++++++++++++++++++++++ 6 files changed, 428 insertions(+), 8 deletions(-) create mode 100644 net/mptcp/crypto.c create mode 100644 net/mptcp/token.c diff --git a/net/mptcp/Makefile b/net/mptcp/Makefile index e1ee5aade8b0..178ae81d8b66 100644 --- a/net/mptcp/Makefile +++ b/net/mptcp/Makefile @@ -1,4 +1,4 @@ # SPDX-License-Identifier: GPL-2.0 obj-$(CONFIG_MPTCP) += mptcp.o -mptcp-y := protocol.o subflow.o options.o +mptcp-y := protocol.o subflow.o options.o token.o crypto.o diff --git a/net/mptcp/crypto.c b/net/mptcp/crypto.c new file mode 100644 index 000000000000..bbd6d01af211 --- /dev/null +++ b/net/mptcp/crypto.c @@ -0,0 +1,122 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Multipath TCP cryptographic functions + * Copyright (c) 2017 - 2019, Intel Corporation. + * + * Note: This code is based on mptcp_ctrl.c, mptcp_ipv4.c, and + * mptcp_ipv6 from multipath-tcp.org, authored by: + * + * Sébastien Barré + * Christoph Paasch + * Jaakko Korkeaniemi + * Gregory Detal + * Fabien Duchêne + * Andreas Seelinger + * Lavkesh Lahngir + * Andreas Ripke + * Vlad Dogaru + * Octavian Purdila + * John Ronan + * Catalin Nicutar + * Brandon Heller + */ + +#include +#include +#include + +#include "protocol.h" + +struct sha1_state { + u32 workspace[SHA_WORKSPACE_WORDS]; + u32 digest[SHA_DIGEST_WORDS]; + unsigned int count; +}; + +static void sha1_init(struct sha1_state *state) +{ + sha_init(state->digest); + state->count = 0; +} + +static void sha1_update(struct sha1_state *state, u8 *input) +{ + sha_transform(state->digest, input, state->workspace); + state->count += SHA_MESSAGE_BYTES; +} + +static void sha1_pad_final(struct sha1_state *state, u8 *input, + unsigned int length, __be32 *mptcp_hashed_key) +{ + int i; + + input[length] = 0x80; + memset(&input[length + 1], 0, SHA_MESSAGE_BYTES - length - 9); + put_unaligned_be64((length + state->count) << 3, + &input[SHA_MESSAGE_BYTES - 8]); + + sha_transform(state->digest, input, state->workspace); + for (i = 0; i < SHA_DIGEST_WORDS; ++i) + put_unaligned_be32(state->digest[i], &mptcp_hashed_key[i]); + + memzero_explicit(state->workspace, SHA_WORKSPACE_WORDS << 2); +} + +void mptcp_crypto_key_sha(u64 key, u32 *token, u64 *idsn) +{ + __be32 mptcp_hashed_key[SHA_DIGEST_WORDS]; + u8 input[SHA_MESSAGE_BYTES]; + struct sha1_state state; + + sha1_init(&state); + put_unaligned_be64(key, input); + sha1_pad_final(&state, input, 8, mptcp_hashed_key); + + if (token) + *token = be32_to_cpu(mptcp_hashed_key[0]); + if (idsn) + *idsn = be64_to_cpu(*((__be64 *)&mptcp_hashed_key[3])); +} + +void mptcp_crypto_hmac_sha(u64 key1, u64 key2, u32 nonce1, u32 nonce2, + u32 *hash_out) +{ + u8 input[SHA_MESSAGE_BYTES * 2]; + struct sha1_state state; + u8 key1be[8]; + u8 key2be[8]; + int i; + + put_unaligned_be64(key1, key1be); + put_unaligned_be64(key2, key2be); + + /* Generate key xored with ipad */ + memset(input, 0x36, SHA_MESSAGE_BYTES); + for (i = 0; i < 8; i++) + input[i] ^= key1be[i]; + for (i = 0; i < 8; i++) + input[i + 8] ^= key2be[i]; + + put_unaligned_be32(nonce1, &input[SHA_MESSAGE_BYTES]); + put_unaligned_be32(nonce2, &input[SHA_MESSAGE_BYTES + 4]); + + sha1_init(&state); + sha1_update(&state, input); + + /* emit sha256(K1 || msg) on the second input block, so we can + * reuse 'input' for the last hashing + */ + sha1_pad_final(&state, &input[SHA_MESSAGE_BYTES], 8, + (__force __be32 *)&input[SHA_MESSAGE_BYTES]); + + /* Prepare second part of hmac */ + memset(input, 0x5C, SHA_MESSAGE_BYTES); + for (i = 0; i < 8; i++) + input[i] ^= key1be[i]; + for (i = 0; i < 8; i++) + input[i + 8] ^= key2be[i]; + + sha1_init(&state); + sha1_update(&state, input); + sha1_pad_final(&state, &input[SHA_MESSAGE_BYTES], SHA_DIGEST_WORDS << 2, + (__be32 *)hash_out); +} diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index e08a25eabcd5..3f66b6a3bb28 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -201,6 +201,7 @@ static void mptcp_close(struct sock *sk, long timeout) struct mptcp_subflow_context *subflow, *tmp; struct mptcp_sock *msk = mptcp_sk(sk); + mptcp_token_destroy(msk->token); inet_sk_state_store(sk, TCP_CLOSE); lock_sock(sk); @@ -281,8 +282,10 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err, msk = mptcp_sk(new_mptcp_sock); msk->remote_key = subflow->remote_key; msk->local_key = subflow->local_key; + msk->token = subflow->token; msk->subflow = NULL; + mptcp_token_update_accept(newsk, new_mptcp_sock); newsk = new_mptcp_sock; mptcp_copy_inaddrs(newsk, ssk); list_add(&subflow->node, &msk->conn_list); @@ -299,6 +302,10 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err, return newsk; } +static void mptcp_destroy(struct sock *sk) +{ +} + static int mptcp_get_port(struct sock *sk, unsigned short snum) { struct mptcp_sock *msk = mptcp_sk(sk); @@ -331,6 +338,7 @@ void mptcp_finish_connect(struct sock *ssk) */ WRITE_ONCE(msk->remote_key, subflow->remote_key); WRITE_ONCE(msk->local_key, subflow->local_key); + WRITE_ONCE(msk->token, subflow->token); } static void mptcp_sock_graft(struct sock *sk, struct socket *parent) @@ -349,6 +357,7 @@ static struct proto mptcp_prot = { .close = mptcp_close, .accept = mptcp_accept, .shutdown = tcp_shutdown, + .destroy = mptcp_destroy, .sendmsg = mptcp_sendmsg, .recvmsg = mptcp_recvmsg, .hash = inet_hash, @@ -568,6 +577,12 @@ void __init mptcp_init(void) static struct proto_ops mptcp_v6_stream_ops; static struct proto mptcp_v6_prot; +static void mptcp_v6_destroy(struct sock *sk) +{ + mptcp_destroy(sk); + inet6_destroy_sock(sk); +} + static struct inet_protosw mptcp_v6_protosw = { .type = SOCK_STREAM, .protocol = IPPROTO_MPTCP, @@ -583,6 +598,7 @@ int mptcpv6_init(void) mptcp_v6_prot = mptcp_prot; strcpy(mptcp_v6_prot.name, "MPTCPv6"); mptcp_v6_prot.slab = NULL; + mptcp_v6_prot.destroy = mptcp_v6_destroy; mptcp_v6_prot.obj_size = sizeof(struct mptcp_sock) + sizeof(struct ipv6_pinfo); diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index bd66e7415515..5f43fa0275c0 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -7,6 +7,10 @@ #ifndef __MPTCP_PROTOCOL_H #define __MPTCP_PROTOCOL_H +#include +#include +#include + #define MPTCP_SUPPORTED_VERSION 0 /* MPTCP option bits */ @@ -42,6 +46,7 @@ struct mptcp_sock { struct inet_connection_sock sk; u64 local_key; u64 remote_key; + u32 token; struct list_head conn_list; struct socket *subflow; /* outgoing connect/listener/!mp_capable */ }; @@ -61,6 +66,8 @@ struct mptcp_subflow_request_sock { backup : 1; u64 local_key; u64 remote_key; + u64 idsn; + u32 token; }; static inline struct mptcp_subflow_request_sock * @@ -74,6 +81,8 @@ struct mptcp_subflow_context { struct list_head node;/* conn_list of subflows */ u64 local_key; u64 remote_key; + u64 idsn; + u32 token; u32 request_mptcp : 1, /* send MP_CAPABLE */ mp_capable : 1, /* remote is MPTCP capable */ fourth_ack : 1, /* send initial DSS */ @@ -112,4 +121,27 @@ void mptcp_get_options(const struct sk_buff *skb, void mptcp_finish_connect(struct sock *sk); +int mptcp_token_new_request(struct request_sock *req); +void mptcp_token_destroy_request(u32 token); +int mptcp_token_new_connect(struct sock *sk); +int mptcp_token_new_accept(u32 token); +void mptcp_token_update_accept(struct sock *sk, struct sock *conn); +void mptcp_token_destroy(u32 token); + +void mptcp_crypto_key_sha(u64 key, u32 *token, u64 *idsn); +static inline void mptcp_crypto_key_gen_sha(u64 *key, u32 *token, u64 *idsn) +{ + /* we might consider a faster version that computes the key as a + * hash of some information available in the MPTCP socket. Use + * random data at the moment, as it's probably the safest option + * in case multiple sockets are opened in different namespaces at + * the same time. + */ + get_random_bytes(key, sizeof(u64)); + mptcp_crypto_key_sha(*key, token, idsn); +} + +void mptcp_crypto_hmac_sha(u64 key1, u64 key2, u32 nonce1, u32 nonce2, + u32 *hash_out); + #endif /* __MPTCP_PROTOCOL_H */ diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index df3192305967..89b91bc7a831 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -4,6 +4,8 @@ * Copyright (c) 2017 - 2019, Intel Corporation. */ +#define pr_fmt(fmt) "MPTCP: " fmt + #include #include #include @@ -18,6 +20,33 @@ #include #include "protocol.h" +static int subflow_rebuild_header(struct sock *sk) +{ + struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk); + int err = 0; + + if (subflow->request_mptcp && !subflow->token) { + pr_debug("subflow=%p", sk); + err = mptcp_token_new_connect(sk); + } + + if (err) + return err; + + return subflow->icsk_af_ops->rebuild_header(sk); +} + +static void subflow_req_destructor(struct request_sock *req) +{ + struct mptcp_subflow_request_sock *subflow_req = mptcp_subflow_rsk(req); + + pr_debug("subflow_req=%p", subflow_req); + + if (subflow_req->mp_capable) + mptcp_token_destroy_request(subflow_req->token); + tcp_request_sock_ops.destructor(req); +} + static void subflow_init_req(struct request_sock *req, const struct sock *sk_listener, struct sk_buff *skb) @@ -42,7 +71,12 @@ static void subflow_init_req(struct request_sock *req, #endif if (rx_opt.mptcp.mp_capable && listener->request_mptcp) { - subflow_req->mp_capable = 1; + int err; + + err = mptcp_token_new_request(req); + if (err == 0) + subflow_req->mp_capable = 1; + subflow_req->remote_key = rx_opt.mptcp.sndr_key; } } @@ -150,16 +184,28 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk, req_unhash, own_req); if (child && *own_req) { - if (!mptcp_subflow_ctx(child)) { - pr_debug("Closing child socket"); - inet_sk_set_state(child, TCP_CLOSE); - sock_set_flag(child, SOCK_DEAD); - inet_csk_destroy_sock(child); - child = NULL; + struct mptcp_subflow_context *ctx = mptcp_subflow_ctx(child); + + /* we have null ctx on TCP fallback, not fatal on MPC + * handshake + */ + if (!ctx) + return child; + + if (ctx->mp_capable) { + if (mptcp_token_new_accept(ctx->token)) + goto close_child; } } return child; + +close_child: + pr_debug("closing child socket"); + tcp_send_active_reset(child, GFP_ATOMIC); + inet_csk_prepare_forced_close(child); + tcp_done(child); + return NULL; } static struct inet_connection_sock_af_ops subflow_specific; @@ -224,6 +270,7 @@ int mptcp_subflow_create_socket(struct sock *sk, struct socket **new_sock) pr_debug("subflow=%p", subflow); *new_sock = sf; + sock_hold(sk); subflow->conn = sk; return 0; @@ -286,6 +333,9 @@ static void subflow_ulp_release(struct sock *sk) if (!ctx) return; + if (ctx->conn) + sock_put(ctx->conn); + kfree_rcu(ctx, rcu); } @@ -323,6 +373,7 @@ static void subflow_ulp_clone(const struct request_sock *req, new_ctx->fourth_ack = 1; new_ctx->remote_key = subflow_req->remote_key; new_ctx->local_key = subflow_req->local_key; + new_ctx->token = subflow_req->token; } static struct tcp_ulp_ops subflow_ulp_ops __read_mostly = { @@ -346,6 +397,8 @@ static int subflow_ops_init(struct request_sock_ops *subflow_ops) if (!subflow_ops->slab) return -ENOMEM; + subflow_ops->destructor = subflow_req_destructor; + return 0; } @@ -362,6 +415,7 @@ void mptcp_subflow_init(void) subflow_specific.conn_request = subflow_v4_conn_request; subflow_specific.syn_recv_sock = subflow_syn_recv_sock; subflow_specific.sk_rx_dst_set = subflow_finish_connect; + subflow_specific.rebuild_header = subflow_rebuild_header; #if IS_ENABLED(CONFIG_MPTCP_IPV6) subflow_request_sock_ipv6_ops = tcp_request_sock_ipv6_ops; @@ -371,6 +425,7 @@ void mptcp_subflow_init(void) subflow_v6_specific.conn_request = subflow_v6_conn_request; subflow_v6_specific.syn_recv_sock = subflow_syn_recv_sock; subflow_v6_specific.sk_rx_dst_set = subflow_finish_connect; + subflow_v6_specific.rebuild_header = subflow_rebuild_header; subflow_v6m_specific = subflow_v6_specific; subflow_v6m_specific.queue_xmit = ipv4_specific.queue_xmit; diff --git a/net/mptcp/token.c b/net/mptcp/token.c new file mode 100644 index 000000000000..84d887806090 --- /dev/null +++ b/net/mptcp/token.c @@ -0,0 +1,195 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Multipath TCP token management + * Copyright (c) 2017 - 2019, Intel Corporation. + * + * Note: This code is based on mptcp_ctrl.c from multipath-tcp.org, + * authored by: + * + * Sébastien Barré + * Christoph Paasch + * Jaakko Korkeaniemi + * Gregory Detal + * Fabien Duchêne + * Andreas Seelinger + * Lavkesh Lahngir + * Andreas Ripke + * Vlad Dogaru + * Octavian Purdila + * John Ronan + * Catalin Nicutar + * Brandon Heller + */ + +#define pr_fmt(fmt) "MPTCP: " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "protocol.h" + +static RADIX_TREE(token_tree, GFP_ATOMIC); +static RADIX_TREE(token_req_tree, GFP_ATOMIC); +static DEFINE_SPINLOCK(token_tree_lock); +static int token_used __read_mostly; + +/** + * mptcp_token_new_request - create new key/idsn/token for subflow_request + * @req - the request socket + * + * This function is called when a new mptcp connection is coming in. + * + * It creates a unique token to identify the new mptcp connection, + * a secret local key and the initial data sequence number (idsn). + * + * Returns 0 on success. + */ +int mptcp_token_new_request(struct request_sock *req) +{ + struct mptcp_subflow_request_sock *subflow_req = mptcp_subflow_rsk(req); + int err; + + while (1) { + u32 token; + + mptcp_crypto_key_gen_sha(&subflow_req->local_key, + &subflow_req->token, + &subflow_req->idsn); + pr_debug("req=%p local_key=%llu, token=%u, idsn=%llu\n", + req, subflow_req->local_key, subflow_req->token, + subflow_req->idsn); + + token = subflow_req->token; + spin_lock_bh(&token_tree_lock); + if (!radix_tree_lookup(&token_req_tree, token) && + !radix_tree_lookup(&token_tree, token)) + break; + spin_unlock_bh(&token_tree_lock); + } + + err = radix_tree_insert(&token_req_tree, + subflow_req->token, &token_used); + spin_unlock_bh(&token_tree_lock); + return err; +} + +/** + * mptcp_token_new_connect - create new key/idsn/token for subflow + * @sk - the socket that will initiate a connection + * + * This function is called when a new outgoing mptcp connection is + * initiated. + * + * It creates a unique token to identify the new mptcp connection, + * a secret local key and the initial data sequence number (idsn). + * + * On success, the mptcp connection can be found again using + * the computed token at a later time, this is needed to process + * join requests. + * + * returns 0 on success. + */ +int mptcp_token_new_connect(struct sock *sk) +{ + struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk); + struct sock *mptcp_sock = subflow->conn; + int err; + + while (1) { + u32 token; + + mptcp_crypto_key_gen_sha(&subflow->local_key, &subflow->token, + &subflow->idsn); + + pr_debug("ssk=%p, local_key=%llu, token=%u, idsn=%llu\n", + sk, subflow->local_key, subflow->token, subflow->idsn); + + token = subflow->token; + spin_lock_bh(&token_tree_lock); + if (!radix_tree_lookup(&token_req_tree, token) && + !radix_tree_lookup(&token_tree, token)) + break; + spin_unlock_bh(&token_tree_lock); + } + err = radix_tree_insert(&token_tree, subflow->token, mptcp_sock); + spin_unlock_bh(&token_tree_lock); + + return err; +} + +/** + * mptcp_token_new_accept - insert token for later processing + * @token: the token to insert to the tree + * + * Called when a SYN packet creates a new logical connection, i.e. + * is not a join request. + * + * We don't have an mptcp socket yet at that point. + * This is paired with mptcp_token_update_accept, called on accept(). + */ +int mptcp_token_new_accept(u32 token) +{ + int err; + + spin_lock_bh(&token_tree_lock); + err = radix_tree_insert(&token_tree, token, &token_used); + spin_unlock_bh(&token_tree_lock); + + return err; +} + +/** + * mptcp_token_update_accept - update token to map to mptcp socket + * @conn: the new struct mptcp_sock + * @sk: the initial subflow for this mptcp socket + * + * Called when the first mptcp socket is created on accept to + * refresh the dummy mapping (done to reserve the token) with + * the mptcp_socket structure that wasn't allocated before. + */ +void mptcp_token_update_accept(struct sock *sk, struct sock *conn) +{ + struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk); + void __rcu **slot; + + spin_lock_bh(&token_tree_lock); + slot = radix_tree_lookup_slot(&token_tree, subflow->token); + WARN_ON_ONCE(!slot); + if (slot) { + WARN_ON_ONCE(rcu_access_pointer(*slot) != &token_used); + radix_tree_replace_slot(&token_tree, slot, conn); + } + spin_unlock_bh(&token_tree_lock); +} + +/** + * mptcp_token_destroy_request - remove mptcp connection/token + * @token - token of mptcp connection to remove + * + * Remove not-yet-fully-established incoming connection identified + * by @token. + */ +void mptcp_token_destroy_request(u32 token) +{ + spin_lock_bh(&token_tree_lock); + radix_tree_delete(&token_req_tree, token); + spin_unlock_bh(&token_tree_lock); +} + +/** + * mptcp_token_destroy - remove mptcp connection/token + * @token - token of mptcp connection to remove + * + * Remove the connection identified by @token. + */ +void mptcp_token_destroy(u32 token) +{ + spin_lock_bh(&token_tree_lock); + radix_tree_delete(&token_tree, token); + spin_unlock_bh(&token_tree_lock); +} From patchwork Wed Jan 22 00:56:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Paasch X-Patchwork-Id: 1226868 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=198.145.21.10; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=quarantine dis=none) header.from=apple.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=apple.com header.i=@apple.com header.a=rsa-sha256 header.s=20180706 header.b=UGOJ91q+; dkim-atps=neutral Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 482Rpm3999z9sRR for ; Wed, 22 Jan 2020 11:58:08 +1100 (AEDT) Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id AEE2C1007B8C8; Tue, 21 Jan 2020 17:01:24 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=17.151.62.67; helo=nwk-aaemail-lapp02.apple.com; envelope-from=cpaasch@apple.com; receiver= Received: from nwk-aaemail-lapp02.apple.com (nwk-aaemail-lapp02.apple.com [17.151.62.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 60DAC1007B8C3 for ; Tue, 21 Jan 2020 17:01:22 -0800 (PST) Received: from pps.filterd (nwk-aaemail-lapp02.apple.com [127.0.0.1]) by nwk-aaemail-lapp02.apple.com (8.16.0.27/8.16.0.27) with SMTP id 00M0v2eG020009; Tue, 21 Jan 2020 16:57:03 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=apple.com; h=sender : from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=20180706; bh=Ws5pFalh4RDRIZS1tAT+Y9wtH/sKvA/98SzqjVBUANY=; b=UGOJ91q+P3dOAiX7zQhb+Yacy5S3/Phxr1QL6v73Sh0J3+SiUYfx+BEHbHLNAWvBHdro 3pv2bYga1Yecw7cH8LQEcWLo4U784oBnKemE/OiTagafY6CArpqmXLm4+nWHhNkPr9Ue LBmkcemZgjieCvfHwrY+oBsPwwjDfC8U2nF6XtsBgXCFS03oFdfgfVeNKzBPYRyhwwsf Rt3+NIIfGP+w9fsEh5rCaj8IY7DtooDp4+l1FK6miFxW9iW8FGOdNSzelvU6dbAU8erG XIrWCOmBcafmKPAldsCy+6G7txS6YEpAJvAaG3u8QkKKQn5QjWkVexeEf2c3jVYp2F/h gg== Received: from ma1-mtap-s03.corp.apple.com (ma1-mtap-s03.corp.apple.com [17.40.76.7]) by nwk-aaemail-lapp02.apple.com with ESMTP id 2xkyfq8e4a-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Tue, 21 Jan 2020 16:57:03 -0800 Received: from nwk-mmpp-sz13.apple.com (nwk-mmpp-sz13.apple.com [17.128.115.216]) by ma1-mtap-s03.corp.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPS id <0Q4H00064HAZD600@ma1-mtap-s03.corp.apple.com>; Tue, 21 Jan 2020 16:57:02 -0800 (PST) Received: from process_milters-daemon.nwk-mmpp-sz13.apple.com by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) id <0Q4H00100GR2PR00@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:57:01 -0800 (PST) X-Va-A: X-Va-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-Va-E-CD: 989caa25d36f1c7cbaae9b0a0f791d86 X-Va-R-CD: 4cff8adf6547868a0bf984429a97c194 X-Va-CD: 0 X-Va-ID: 304c95f1-bb8b-4755-be17-59e10820accf X-V-A: X-V-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-V-E-CD: 989caa25d36f1c7cbaae9b0a0f791d86 X-V-R-CD: 4cff8adf6547868a0bf984429a97c194 X-V-CD: 0 X-V-ID: fc5c6434-f1c7-4461-9bac-a61689a8ef7f X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2020-01-17_05:,, signatures=0 Received: from localhost ([17.192.155.241]) by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPSA id <0Q4H0011ZHB04Y50@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:57:00 -0800 (PST) Sender: cpaasch@apple.com From: Christoph Paasch To: netdev@vger.kernel.org Date: Tue, 21 Jan 2020 16:56:21 -0800 Message-id: <20200122005633.21229-8-cpaasch@apple.com> X-Mailer: git-send-email 2.23.0 In-reply-to: <20200122005633.21229-1-cpaasch@apple.com> References: <20200122005633.21229-1-cpaasch@apple.com> MIME-version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2020-01-17_05:, , signatures=0 Message-ID-Hash: DOYNRP6WZDOHNUJTNQODT7BGPHP3B5YI X-Message-ID-Hash: DOYNRP6WZDOHNUJTNQODT7BGPHP3B5YI X-MailFrom: cpaasch@apple.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: mptcp@lists.01.org X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH net-next v3 07/19] mptcp: Add shutdown() socket operation List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: From: Peter Krystad Call shutdown on all subflows in use on the given socket, or on the fallback socket. Co-developed-by: Florian Westphal Signed-off-by: Florian Westphal Signed-off-by: Peter Krystad Signed-off-by: Christoph Paasch --- net/mptcp/protocol.c | 66 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 66 insertions(+) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 3f66b6a3bb28..249b2506a66a 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -196,6 +196,29 @@ static int mptcp_init_sock(struct sock *sk) return 0; } +static void mptcp_subflow_shutdown(struct sock *ssk, int how) +{ + lock_sock(ssk); + + switch (ssk->sk_state) { + case TCP_LISTEN: + if (!(how & RCV_SHUTDOWN)) + break; + /* fall through */ + case TCP_SYN_SENT: + tcp_disconnect(ssk, O_NONBLOCK); + break; + default: + ssk->sk_shutdown |= how; + tcp_shutdown(ssk, how); + break; + } + + /* Wake up anyone sleeping in poll. */ + ssk->sk_state_change(ssk); + release_sock(ssk); +} + static void mptcp_close(struct sock *sk, long timeout) { struct mptcp_subflow_context *subflow, *tmp; @@ -273,6 +296,7 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err, *err = -ENOBUFS; local_bh_enable(); release_sock(sk); + mptcp_subflow_shutdown(newsk, SHUT_RDWR + 1); tcp_close(newsk, 0); return NULL; } @@ -544,6 +568,46 @@ static __poll_t mptcp_poll(struct file *file, struct socket *sock, return mask; } +static int mptcp_shutdown(struct socket *sock, int how) +{ + struct mptcp_sock *msk = mptcp_sk(sock->sk); + struct mptcp_subflow_context *subflow; + int ret = 0; + + pr_debug("sk=%p, how=%d", msk, how); + + lock_sock(sock->sk); + + if (how == SHUT_WR || how == SHUT_RDWR) + inet_sk_state_store(sock->sk, TCP_FIN_WAIT1); + + how++; + + if ((how & ~SHUTDOWN_MASK) || !how) { + ret = -EINVAL; + goto out_unlock; + } + + if (sock->state == SS_CONNECTING) { + if ((1 << sock->sk->sk_state) & + (TCPF_SYN_SENT | TCPF_SYN_RECV | TCPF_CLOSE)) + sock->state = SS_DISCONNECTING; + else + sock->state = SS_CONNECTED; + } + + mptcp_for_each_subflow(msk, subflow) { + struct sock *tcp_sk = mptcp_subflow_tcp_sock(subflow); + + mptcp_subflow_shutdown(tcp_sk, how); + } + +out_unlock: + release_sock(sock->sk); + + return ret; +} + static struct proto_ops mptcp_stream_ops; static struct inet_protosw mptcp_protosw = { @@ -564,6 +628,7 @@ void __init mptcp_init(void) mptcp_stream_ops.accept = mptcp_stream_accept; mptcp_stream_ops.getname = mptcp_v4_getname; mptcp_stream_ops.listen = mptcp_listen; + mptcp_stream_ops.shutdown = mptcp_shutdown; mptcp_subflow_init(); @@ -613,6 +678,7 @@ int mptcpv6_init(void) mptcp_v6_stream_ops.accept = mptcp_stream_accept; mptcp_v6_stream_ops.getname = mptcp_v6_getname; mptcp_v6_stream_ops.listen = mptcp_listen; + mptcp_v6_stream_ops.shutdown = mptcp_shutdown; err = inet6_register_protosw(&mptcp_v6_protosw); if (err) From patchwork Wed Jan 22 00:56:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Paasch X-Patchwork-Id: 1226858 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=198.145.21.10; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=quarantine dis=none) header.from=apple.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=apple.com header.i=@apple.com header.a=rsa-sha256 header.s=20180706 header.b=FtbOckj0; dkim-atps=neutral Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 482Rnf2TPHz9sRk for ; Wed, 22 Jan 2020 11:57:10 +1100 (AEDT) Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 31E131007B8E1; Tue, 21 Jan 2020 17:00:27 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=17.151.62.66; helo=nwk-aaemail-lapp01.apple.com; envelope-from=cpaasch@apple.com; receiver= Received: from nwk-aaemail-lapp01.apple.com (nwk-aaemail-lapp01.apple.com [17.151.62.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id B1B191007B8DD for ; Tue, 21 Jan 2020 17:00:24 -0800 (PST) Received: from pps.filterd (nwk-aaemail-lapp01.apple.com [127.0.0.1]) by nwk-aaemail-lapp01.apple.com (8.16.0.27/8.16.0.27) with SMTP id 00M0urWu049855; Tue, 21 Jan 2020 16:57:05 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=apple.com; h=sender : from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=20180706; bh=Ae2hDe9jeyTuxtxnE1PuRjG94xEJsl+jXkKJJYO/6R8=; b=FtbOckj0vMpYkJJo1uEGwBAEsgXn3ng+dZI3aO7+3kcujkOrkxhfWTcn2b4zLRNlrUlQ 6wGtmL0FZnY3vK/csTr41UlcSn7Ki31SZa5LWBOYZHk9uETzI5fR6fuTLo8JFqH4gcge /XjZ5HVfPfNyRcrFoJH/n9i8Eo2bOBe2WD/T4GcK4cFmupyfDi916U34lf6g8zTMYFyT 93moShhk1LD7iC6PwVSHSRRzmZGDj4W+WZkPiBjF8gwavdzmhkQ0IJPhUXreA+3ZJJmw l/Z6tUWzo+lqoQzztg5i8mLPBulA+89YRP+fLxSQ68A/2AzSs5hEGk+tQ5hkHvLW+TXK XA== Received: from ma1-mtap-s01.corp.apple.com (ma1-mtap-s01.corp.apple.com [17.40.76.5]) by nwk-aaemail-lapp01.apple.com with ESMTP id 2xm1u51ya3-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Tue, 21 Jan 2020 16:57:05 -0800 Received: from nwk-mmpp-sz13.apple.com (nwk-mmpp-sz13.apple.com [17.128.115.216]) by ma1-mtap-s01.corp.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPS id <0Q4H00HWQHAUJ720@ma1-mtap-s01.corp.apple.com>; Tue, 21 Jan 2020 16:57:02 -0800 (PST) Received: from process_milters-daemon.nwk-mmpp-sz13.apple.com by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) id <0Q4H00100GR2PR00@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:57:01 -0800 (PST) X-Va-A: X-Va-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-Va-E-CD: dcf48962d7af3bef0553ce6736fcbceb X-Va-R-CD: 34982985ee288f0a9fc84841eeddfdb7 X-Va-CD: 0 X-Va-ID: ad8974cc-ce7b-4fd8-8f9b-5766312d1f97 X-V-A: X-V-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-V-E-CD: dcf48962d7af3bef0553ce6736fcbceb X-V-R-CD: 34982985ee288f0a9fc84841eeddfdb7 X-V-CD: 0 X-V-ID: b74b09bc-c712-41d4-b770-8c0f556f136b X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2020-01-17_05:,, signatures=0 Received: from localhost ([17.192.155.241]) by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPSA id <0Q4H00DSGHB0DC30@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:57:00 -0800 (PST) Sender: cpaasch@apple.com From: Christoph Paasch To: netdev@vger.kernel.org Date: Tue, 21 Jan 2020 16:56:22 -0800 Message-id: <20200122005633.21229-9-cpaasch@apple.com> X-Mailer: git-send-email 2.23.0 In-reply-to: <20200122005633.21229-1-cpaasch@apple.com> References: <20200122005633.21229-1-cpaasch@apple.com> MIME-version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2020-01-17_05:, , signatures=0 Message-ID-Hash: HJWYNB6MSD756DW6B6BO2FAOJR5HV7Z4 X-Message-ID-Hash: HJWYNB6MSD756DW6B6BO2FAOJR5HV7Z4 X-MailFrom: cpaasch@apple.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: mptcp@lists.01.org X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH net-next v3 08/19] mptcp: Add setsockopt()/getsockopt() socket operations List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: From: Peter Krystad set/getsockopt behaviour with multiple subflows is undefined. Therefore, for now, we return -EOPNOTSUPP unless we're in fallback mode. Co-developed-by: Paolo Abeni Signed-off-by: Paolo Abeni Signed-off-by: Peter Krystad Signed-off-by: Christoph Paasch --- net/mptcp/protocol.c | 58 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 58 insertions(+) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 249b2506a66a..875ca48de4b2 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -330,6 +330,62 @@ static void mptcp_destroy(struct sock *sk) { } +static int mptcp_setsockopt(struct sock *sk, int level, int optname, + char __user *uoptval, unsigned int optlen) +{ + struct mptcp_sock *msk = mptcp_sk(sk); + char __kernel *optval; + int ret = -EOPNOTSUPP; + struct socket *ssock; + + /* will be treated as __user in tcp_setsockopt */ + optval = (char __kernel __force *)uoptval; + + pr_debug("msk=%p", msk); + + /* @@ the meaning of setsockopt() when the socket is connected and + * there are multiple subflows is not defined. + */ + lock_sock(sk); + ssock = __mptcp_socket_create(msk, MPTCP_SAME_STATE); + if (!IS_ERR(ssock)) { + pr_debug("subflow=%p", ssock->sk); + ret = kernel_setsockopt(ssock, level, optname, optval, optlen); + } + release_sock(sk); + + return ret; +} + +static int mptcp_getsockopt(struct sock *sk, int level, int optname, + char __user *uoptval, int __user *uoption) +{ + struct mptcp_sock *msk = mptcp_sk(sk); + char __kernel *optval; + int ret = -EOPNOTSUPP; + int __kernel *option; + struct socket *ssock; + + /* will be treated as __user in tcp_getsockopt */ + optval = (char __kernel __force *)uoptval; + option = (int __kernel __force *)uoption; + + pr_debug("msk=%p", msk); + + /* @@ the meaning of getsockopt() when the socket is connected and + * there are multiple subflows is not defined. + */ + lock_sock(sk); + ssock = __mptcp_socket_create(msk, MPTCP_SAME_STATE); + if (!IS_ERR(ssock)) { + pr_debug("subflow=%p", ssock->sk); + ret = kernel_getsockopt(ssock, level, optname, optval, option); + } + release_sock(sk); + + return ret; +} + static int mptcp_get_port(struct sock *sk, unsigned short snum) { struct mptcp_sock *msk = mptcp_sk(sk); @@ -380,6 +436,8 @@ static struct proto mptcp_prot = { .init = mptcp_init_sock, .close = mptcp_close, .accept = mptcp_accept, + .setsockopt = mptcp_setsockopt, + .getsockopt = mptcp_getsockopt, .shutdown = tcp_shutdown, .destroy = mptcp_destroy, .sendmsg = mptcp_sendmsg, From patchwork Wed Jan 22 00:56:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Paasch X-Patchwork-Id: 1226866 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=2001:19d0:306:5::1; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=quarantine dis=none) header.from=apple.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=apple.com header.i=@apple.com header.a=rsa-sha256 header.s=20180706 header.b=LITt33Bj; dkim-atps=neutral Received: from ml01.01.org (ml01.01.org [IPv6:2001:19d0:306:5::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 482Rpl6xvdz9sRG for ; Wed, 22 Jan 2020 11:58:07 +1100 (AEDT) Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id BCA011007B8E0; Tue, 21 Jan 2020 17:01:24 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=17.151.62.67; helo=nwk-aaemail-lapp02.apple.com; envelope-from=cpaasch@apple.com; receiver= Received: from nwk-aaemail-lapp02.apple.com (nwk-aaemail-lapp02.apple.com [17.151.62.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 6A4631007B8C5 for ; Tue, 21 Jan 2020 17:01:22 -0800 (PST) Received: from pps.filterd (nwk-aaemail-lapp02.apple.com [127.0.0.1]) by nwk-aaemail-lapp02.apple.com (8.16.0.27/8.16.0.27) with SMTP id 00M0v2eH020009; Tue, 21 Jan 2020 16:57:03 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=apple.com; h=sender : from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=20180706; bh=RDNlq2munBKSuC648mUSP+TR6llZIRDYoUNcBKz7jGM=; b=LITt33BjOPTCyPyLHqQh5Xhnmh1nocOstktk0Xmi6TXHZvE1Pdcey4COrP2sq2GvZBGQ axqlsKCfm1z3W8p5D424NssaXd7PT3/RvPuovT8WLkEfsGpYXIfIvjr2cF10LXGtjaDm A7HpJsypbMCh0MyrdVKKxoCUjlOyZWYnm1+lEJfm/GsSB/yI5nNqzdDHa7SOufAoCRb4 A283WPkH1r0sNLCzrSe8zG94YnkPPUpqyvvgITmyZzuUApctAlZNSfRnl8AKH1NEkSIr HCQqh0qloJ6zgVRwFRihnmqsyA2yIVA0lNtBWKvAgjlSSPQpae7LhTAbQaP9itqRPqmt 1Q== Received: from ma1-mtap-s03.corp.apple.com (ma1-mtap-s03.corp.apple.com [17.40.76.7]) by nwk-aaemail-lapp02.apple.com with ESMTP id 2xkyfq8e4a-4 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Tue, 21 Jan 2020 16:57:03 -0800 Received: from nwk-mmpp-sz13.apple.com (nwk-mmpp-sz13.apple.com [17.128.115.216]) by ma1-mtap-s03.corp.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPS id <0Q4H00064HAZD600@ma1-mtap-s03.corp.apple.com>; Tue, 21 Jan 2020 16:57:02 -0800 (PST) Received: from process_milters-daemon.nwk-mmpp-sz13.apple.com by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) id <0Q4H00100GR2PR00@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:57:01 -0800 (PST) X-Va-A: X-Va-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-Va-E-CD: 70acec450d9b41055be30a71b77bd8f2 X-Va-R-CD: 09063fe5c141b208bfda4f1f74adc6cd X-Va-CD: 0 X-Va-ID: 4c2a987a-c372-4626-8404-524c834ba3ff X-V-A: X-V-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-V-E-CD: 70acec450d9b41055be30a71b77bd8f2 X-V-R-CD: 09063fe5c141b208bfda4f1f74adc6cd X-V-CD: 0 X-V-ID: 5da07032-c4ee-4f56-a507-7dc75348228b X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2020-01-17_05:,, signatures=0 Received: from localhost ([17.192.155.241]) by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPSA id <0Q4H00123HB04Y50@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:57:00 -0800 (PST) Sender: cpaasch@apple.com From: Christoph Paasch To: netdev@vger.kernel.org Date: Tue, 21 Jan 2020 16:56:23 -0800 Message-id: <20200122005633.21229-10-cpaasch@apple.com> X-Mailer: git-send-email 2.23.0 In-reply-to: <20200122005633.21229-1-cpaasch@apple.com> References: <20200122005633.21229-1-cpaasch@apple.com> MIME-version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2020-01-17_05:, , signatures=0 Message-ID-Hash: VVVMQGSFDW667IYPHEZHLAGCBHRI3TQZ X-Message-ID-Hash: VVVMQGSFDW667IYPHEZHLAGCBHRI3TQZ X-MailFrom: cpaasch@apple.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: mptcp@lists.01.org X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH net-next v3 09/19] mptcp: Write MPTCP DSS headers to outgoing data packets List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: From: Mat Martineau Per-packet metadata required to write the MPTCP DSS option is written to the skb_ext area. One write to the socket may contain more than one packet of data, which is copied to page fragments and mapped in to MPTCP DSS segments with size determined by the available page fragments and the maximum mapping length allowed by the MPTCP specification. If do_tcp_sendpages() splits a DSS segment in to multiple skbs, that's ok - the later skbs can either have duplicated DSS mapping information or none at all, and the receiver can handle that. The current implementation uses the subflow frag cache and tcp sendpages to avoid excessive code duplication. More work is required to ensure that it works correctly under memory pressure and to support MPTCP-level retransmissions. The MPTCP DSS checksum is not yet implemented. Co-developed-by: Paolo Abeni Signed-off-by: Paolo Abeni Co-developed-by: Peter Krystad Signed-off-by: Peter Krystad Co-developed-by: Florian Westphal Signed-off-by: Florian Westphal Signed-off-by: Mat Martineau Signed-off-by: Christoph Paasch --- include/net/mptcp.h | 1 + net/mptcp/options.c | 155 +++++++++++++++++++++++++++++++++++++++++-- net/mptcp/protocol.c | 116 +++++++++++++++++++++++++++++++- net/mptcp/protocol.h | 20 ++++++ 4 files changed, 286 insertions(+), 6 deletions(-) diff --git a/include/net/mptcp.h b/include/net/mptcp.h index eabc57c3fde4..06dcc665135e 100644 --- a/include/net/mptcp.h +++ b/include/net/mptcp.h @@ -32,6 +32,7 @@ struct mptcp_out_options { u16 suboptions; u64 sndr_key; u64 rcvr_key; + struct mptcp_ext ext_copy; #endif }; diff --git a/net/mptcp/options.c b/net/mptcp/options.c index 52ff2301b68b..5ea127bc7f01 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -134,13 +134,13 @@ void mptcp_rcv_synsent(struct sock *sk) } } -bool mptcp_established_options(struct sock *sk, struct sk_buff *skb, - unsigned int *size, unsigned int remaining, - struct mptcp_out_options *opts) +static bool mptcp_established_options_mp(struct sock *sk, unsigned int *size, + unsigned int remaining, + struct mptcp_out_options *opts) { struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk); - if (subflow->mp_capable && !subflow->fourth_ack) { + if (!subflow->fourth_ack) { opts->suboptions = OPTION_MPTCP_MPC_ACK; opts->sndr_key = subflow->local_key; opts->rcvr_key = subflow->remote_key; @@ -153,6 +153,112 @@ bool mptcp_established_options(struct sock *sk, struct sk_buff *skb, return false; } +static void mptcp_write_data_fin(struct mptcp_subflow_context *subflow, + struct mptcp_ext *ext) +{ + ext->data_fin = 1; + + if (!ext->use_map) { + /* RFC6824 requires a DSS mapping with specific values + * if DATA_FIN is set but no data payload is mapped + */ + ext->use_map = 1; + ext->dsn64 = 1; + ext->data_seq = mptcp_sk(subflow->conn)->write_seq; + ext->subflow_seq = 0; + ext->data_len = 1; + } else { + /* If there's an existing DSS mapping, DATA_FIN consumes + * 1 additional byte of mapping space. + */ + ext->data_len++; + } +} + +static bool mptcp_established_options_dss(struct sock *sk, struct sk_buff *skb, + unsigned int *size, + unsigned int remaining, + struct mptcp_out_options *opts) +{ + struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk); + unsigned int dss_size = 0; + struct mptcp_ext *mpext; + struct mptcp_sock *msk; + unsigned int ack_size; + u8 tcp_fin; + + if (skb) { + mpext = mptcp_get_ext(skb); + tcp_fin = TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN; + } else { + mpext = NULL; + tcp_fin = 0; + } + + if (!skb || (mpext && mpext->use_map) || tcp_fin) { + unsigned int map_size; + + map_size = TCPOLEN_MPTCP_DSS_BASE + TCPOLEN_MPTCP_DSS_MAP64; + + remaining -= map_size; + dss_size = map_size; + if (mpext) + opts->ext_copy = *mpext; + + if (skb && tcp_fin && + subflow->conn->sk_state != TCP_ESTABLISHED) + mptcp_write_data_fin(subflow, &opts->ext_copy); + } + + ack_size = TCPOLEN_MPTCP_DSS_ACK64; + + /* Add kind/length/subtype/flag overhead if mapping is not populated */ + if (dss_size == 0) + ack_size += TCPOLEN_MPTCP_DSS_BASE; + + dss_size += ack_size; + + msk = mptcp_sk(mptcp_subflow_ctx(sk)->conn); + if (msk) { + opts->ext_copy.data_ack = msk->ack_seq; + } else { + mptcp_crypto_key_sha(mptcp_subflow_ctx(sk)->remote_key, + NULL, &opts->ext_copy.data_ack); + opts->ext_copy.data_ack++; + } + + opts->ext_copy.ack64 = 1; + opts->ext_copy.use_ack = 1; + + *size = ALIGN(dss_size, 4); + return true; +} + +bool mptcp_established_options(struct sock *sk, struct sk_buff *skb, + unsigned int *size, unsigned int remaining, + struct mptcp_out_options *opts) +{ + unsigned int opt_size = 0; + bool ret = false; + + if (mptcp_established_options_mp(sk, &opt_size, remaining, opts)) + ret = true; + else if (mptcp_established_options_dss(sk, skb, &opt_size, remaining, + opts)) + ret = true; + + /* we reserved enough space for the above options, and exceeding the + * TCP option space would be fatal + */ + if (WARN_ON_ONCE(opt_size > remaining)) + return false; + + *size += opt_size; + remaining -= opt_size; + + return ret; +} + bool mptcp_synack_options(const struct request_sock *req, unsigned int *size, struct mptcp_out_options *opts) { @@ -194,4 +300,45 @@ void mptcp_write_options(__be32 *ptr, struct mptcp_out_options *opts) ptr += 2; } } + + if (opts->ext_copy.use_ack || opts->ext_copy.use_map) { + struct mptcp_ext *mpext = &opts->ext_copy; + u8 len = TCPOLEN_MPTCP_DSS_BASE; + u8 flags = 0; + + if (mpext->use_ack) { + len += TCPOLEN_MPTCP_DSS_ACK64; + flags = MPTCP_DSS_HAS_ACK | MPTCP_DSS_ACK64; + } + + if (mpext->use_map) { + len += TCPOLEN_MPTCP_DSS_MAP64; + + /* Use only 64-bit mapping flags for now, add + * support for optional 32-bit mappings later. + */ + flags |= MPTCP_DSS_HAS_MAP | MPTCP_DSS_DSN64; + if (mpext->data_fin) + flags |= MPTCP_DSS_DATA_FIN; + } + + *ptr++ = htonl((TCPOPT_MPTCP << 24) | + (len << 16) | + (MPTCPOPT_DSS << 12) | + (flags)); + + if (mpext->use_ack) { + put_unaligned_be64(mpext->data_ack, ptr); + ptr += 2; + } + + if (mpext->use_map) { + put_unaligned_be64(mpext->data_seq, ptr); + ptr += 2; + put_unaligned_be32(mpext->subflow_seq, ptr); + ptr += 1; + put_unaligned_be32(mpext->data_len << 16 | + TCPOPT_NOP << 8 | TCPOPT_NOP, ptr); + } + } } diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 875ca48de4b2..8cf49193b1c0 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -97,12 +97,93 @@ static struct sock *mptcp_subflow_get(const struct mptcp_sock *msk) return NULL; } +static bool mptcp_ext_cache_refill(struct mptcp_sock *msk) +{ + if (!msk->cached_ext) + msk->cached_ext = __skb_ext_alloc(); + + return !!msk->cached_ext; +} + +static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk, + struct msghdr *msg, long *timeo) +{ + int mss_now = 0, size_goal = 0, ret = 0; + struct mptcp_sock *msk = mptcp_sk(sk); + struct mptcp_ext *mpext = NULL; + struct page_frag *pfrag; + struct sk_buff *skb; + size_t psize; + + /* use the mptcp page cache so that we can easily move the data + * from one substream to another, but do per subflow memory accounting + */ + pfrag = sk_page_frag(sk); + while (!sk_page_frag_refill(ssk, pfrag) || + !mptcp_ext_cache_refill(msk)) { + ret = sk_stream_wait_memory(ssk, timeo); + if (ret) + return ret; + } + + /* compute copy limit */ + mss_now = tcp_send_mss(ssk, &size_goal, msg->msg_flags); + psize = min_t(int, pfrag->size - pfrag->offset, size_goal); + + pr_debug("left=%zu", msg_data_left(msg)); + psize = copy_page_from_iter(pfrag->page, pfrag->offset, + min_t(size_t, msg_data_left(msg), psize), + &msg->msg_iter); + pr_debug("left=%zu", msg_data_left(msg)); + if (!psize) + return -EINVAL; + + /* Mark the end of the previous write so the beginning of the + * next write (with its own mptcp skb extension data) is not + * collapsed. + */ + skb = tcp_write_queue_tail(ssk); + if (skb) + TCP_SKB_CB(skb)->eor = 1; + + ret = do_tcp_sendpages(ssk, pfrag->page, pfrag->offset, psize, + msg->msg_flags | MSG_SENDPAGE_NOTLAST); + if (ret <= 0) + return ret; + if (unlikely(ret < psize)) + iov_iter_revert(&msg->msg_iter, psize - ret); + + skb = tcp_write_queue_tail(ssk); + mpext = __skb_ext_set(skb, SKB_EXT_MPTCP, msk->cached_ext); + msk->cached_ext = NULL; + + memset(mpext, 0, sizeof(*mpext)); + mpext->data_seq = msk->write_seq; + mpext->subflow_seq = mptcp_subflow_ctx(ssk)->rel_write_seq; + mpext->data_len = ret; + mpext->use_map = 1; + mpext->dsn64 = 1; + + pr_debug("data_seq=%llu subflow_seq=%u data_len=%u dsn64=%d", + mpext->data_seq, mpext->subflow_seq, mpext->data_len, + mpext->dsn64); + + pfrag->offset += ret; + msk->write_seq += ret; + mptcp_subflow_ctx(ssk)->rel_write_seq += ret; + + tcp_push(ssk, msg->msg_flags, mss_now, tcp_sk(ssk)->nonagle, size_goal); + return ret; +} + static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) { struct mptcp_sock *msk = mptcp_sk(sk); struct socket *ssock; + size_t copied = 0; struct sock *ssk; - int ret; + int ret = 0; + long timeo; if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL)) return -EOPNOTSUPP; @@ -116,14 +197,29 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) return ret; } + timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT); + ssk = mptcp_subflow_get(msk); if (!ssk) { release_sock(sk); return -ENOTCONN; } - ret = sock_sendmsg(ssk->sk_socket, msg); + pr_debug("conn_list->subflow=%p", ssk); + lock_sock(ssk); + while (msg_data_left(msg)) { + ret = mptcp_sendmsg_frag(sk, ssk, msg, &timeo); + if (ret < 0) + break; + + copied += ret; + } + + if (copied > 0) + ret = copied; + + release_sock(ssk); release_sock(sk); return ret; } @@ -235,6 +331,8 @@ static void mptcp_close(struct sock *sk, long timeout) __mptcp_close_ssk(sk, ssk, subflow, timeout); } + if (msk->cached_ext) + __skb_ext_put(msk->cached_ext); release_sock(sk); sk_common_release(sk); } @@ -286,6 +384,7 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err, struct mptcp_subflow_context *subflow; struct sock *new_mptcp_sock; struct sock *ssk = newsk; + u64 ack_seq; subflow = mptcp_subflow_ctx(newsk); lock_sock(sk); @@ -310,6 +409,12 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err, msk->subflow = NULL; mptcp_token_update_accept(newsk, new_mptcp_sock); + + mptcp_crypto_key_sha(msk->remote_key, NULL, &ack_seq); + msk->write_seq = subflow->idsn + 1; + ack_seq++; + msk->ack_seq = ack_seq; + subflow->rel_write_seq = 1; newsk = new_mptcp_sock; mptcp_copy_inaddrs(newsk, ssk); list_add(&subflow->node, &msk->conn_list); @@ -404,6 +509,7 @@ void mptcp_finish_connect(struct sock *ssk) struct mptcp_subflow_context *subflow; struct mptcp_sock *msk; struct sock *sk; + u64 ack_seq; subflow = mptcp_subflow_ctx(ssk); @@ -413,12 +519,18 @@ void mptcp_finish_connect(struct sock *ssk) sk = subflow->conn; msk = mptcp_sk(sk); + mptcp_crypto_key_sha(subflow->remote_key, NULL, &ack_seq); + ack_seq++; + subflow->rel_write_seq = 1; + /* the socket is not connected yet, no msk/subflow ops can access/race * accessing the field below */ WRITE_ONCE(msk->remote_key, subflow->remote_key); WRITE_ONCE(msk->local_key, subflow->local_key); WRITE_ONCE(msk->token, subflow->token); + WRITE_ONCE(msk->write_seq, subflow->idsn + 1); + WRITE_ONCE(msk->ack_seq, ack_seq); } static void mptcp_sock_graft(struct sock *sk, struct socket *parent) diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 5f43fa0275c0..384ec4804198 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -32,6 +32,10 @@ #define TCPOLEN_MPTCP_MPC_SYN 12 #define TCPOLEN_MPTCP_MPC_SYNACK 12 #define TCPOLEN_MPTCP_MPC_ACK 20 +#define TCPOLEN_MPTCP_DSS_BASE 4 +#define TCPOLEN_MPTCP_DSS_ACK64 8 +#define TCPOLEN_MPTCP_DSS_MAP64 14 +#define TCPOLEN_MPTCP_DSS_CHECKSUM 2 /* MPTCP MP_CAPABLE flags */ #define MPTCP_VERSION_MASK (0x0F) @@ -40,14 +44,24 @@ #define MPTCP_CAP_HMAC_SHA1 BIT(0) #define MPTCP_CAP_FLAG_MASK (0x3F) +/* MPTCP DSS flags */ +#define MPTCP_DSS_DATA_FIN BIT(4) +#define MPTCP_DSS_DSN64 BIT(3) +#define MPTCP_DSS_HAS_MAP BIT(2) +#define MPTCP_DSS_ACK64 BIT(1) +#define MPTCP_DSS_HAS_ACK BIT(0) + /* MPTCP connection sock */ struct mptcp_sock { /* inet_connection_sock must be the first member */ struct inet_connection_sock sk; u64 local_key; u64 remote_key; + u64 write_seq; + u64 ack_seq; u32 token; struct list_head conn_list; + struct skb_ext *cached_ext; /* for the next sendmsg */ struct socket *subflow; /* outgoing connect/listener/!mp_capable */ }; @@ -83,6 +97,7 @@ struct mptcp_subflow_context { u64 remote_key; u64 idsn; u32 token; + u32 rel_write_seq; u32 request_mptcp : 1, /* send MP_CAPABLE */ mp_capable : 1, /* remote is MPTCP capable */ fourth_ack : 1, /* send initial DSS */ @@ -144,4 +159,9 @@ static inline void mptcp_crypto_key_gen_sha(u64 *key, u32 *token, u64 *idsn) void mptcp_crypto_hmac_sha(u64 key1, u64 key2, u32 nonce1, u32 nonce2, u32 *hash_out); +static inline struct mptcp_ext *mptcp_get_ext(struct sk_buff *skb) +{ + return (struct mptcp_ext *)skb_ext_find(skb, SKB_EXT_MPTCP); +} + #endif /* __MPTCP_PROTOCOL_H */ From patchwork Wed Jan 22 00:56:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Paasch X-Patchwork-Id: 1226859 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=2001:19d0:306:5::1; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=quarantine dis=none) header.from=apple.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=apple.com header.i=@apple.com header.a=rsa-sha256 header.s=20180706 header.b=c+BTKVwZ; dkim-atps=neutral Received: from ml01.01.org (ml01.01.org [IPv6:2001:19d0:306:5::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 482Rnf37Xcz9sRl for ; Wed, 22 Jan 2020 11:57:10 +1100 (AEDT) Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 3BCE81007B8F3; Tue, 21 Jan 2020 17:00:27 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=17.171.2.60; helo=ma1-aaemail-dr-lapp01.apple.com; envelope-from=cpaasch@apple.com; receiver= Received: from ma1-aaemail-dr-lapp01.apple.com (ma1-aaemail-dr-lapp01.apple.com [17.171.2.60]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id C3ECC1007B8DD for ; Tue, 21 Jan 2020 17:00:24 -0800 (PST) Received: from pps.filterd (ma1-aaemail-dr-lapp01.apple.com [127.0.0.1]) by ma1-aaemail-dr-lapp01.apple.com (8.16.0.27/8.16.0.27) with SMTP id 00M0v12h016568; Tue, 21 Jan 2020 16:57:04 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=apple.com; h=sender : from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=20180706; bh=P4HYqWnPW3LM4NsyheBoYc5p3humRdmUtwWamNPKG6E=; b=c+BTKVwZaOeG+RuQg1mh9xi/UWi42TSJQnA2Oeq4xHRTE0sVOu5SQKZAxaBUd7Be1vAQ KPMu7lEyFNkRTPASxFUOBa5zjuF3rTV9DPXp87gK4/wiRS2ZaummhykskEnKnDSLb5qA YLUS4Xdf4saSFHjLgZil86JDDwFvUNnLmLj10nBlcvzFyWzJM+PM1Xje+aSOseU/V4AH J6oacuHtbaBNL1jwYjcRkXpBy8jKqtyAMzey/HP3Z/Hf44EXEpuy9kJEnm19RSjmvZT9 qjXv86DZoRJgYgd8yKGkFiHcydEbbE50JHcnKC0xyNI5thlhSnnzHPcEi1Ea/ORbQWDc Rw== Received: from mr2-mtap-s02.rno.apple.com (mr2-mtap-s02.rno.apple.com [17.179.226.134]) by ma1-aaemail-dr-lapp01.apple.com with ESMTP id 2xm1j8ksrg-8 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Tue, 21 Jan 2020 16:57:03 -0800 Received: from nwk-mmpp-sz13.apple.com (nwk-mmpp-sz13.apple.com [17.128.115.216]) by mr2-mtap-s02.rno.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPS id <0Q4H00ERMHB31200@mr2-mtap-s02.rno.apple.com>; Tue, 21 Jan 2020 16:57:03 -0800 (PST) Received: from process_milters-daemon.nwk-mmpp-sz13.apple.com by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) id <0Q4H00100GR2PR00@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:57:03 -0800 (PST) X-Va-A: X-Va-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-Va-E-CD: ac8e93af83cdded18ecd24b166711788 X-Va-R-CD: 68f9cb94711d0221451738a98b442b3b X-Va-CD: 0 X-Va-ID: 78f5d884-080b-4c8f-833f-440a6072fe61 X-V-A: X-V-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-V-E-CD: ac8e93af83cdded18ecd24b166711788 X-V-R-CD: 68f9cb94711d0221451738a98b442b3b X-V-CD: 0 X-V-ID: 5f155179-2ad4-4375-8e8d-28fa2b39fc42 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2020-01-17_05:,, signatures=0 Received: from localhost ([17.192.155.241]) by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPSA id <0Q4H00125HB04Y50@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:57:00 -0800 (PST) Sender: cpaasch@apple.com From: Christoph Paasch To: netdev@vger.kernel.org Date: Tue, 21 Jan 2020 16:56:24 -0800 Message-id: <20200122005633.21229-11-cpaasch@apple.com> X-Mailer: git-send-email 2.23.0 In-reply-to: <20200122005633.21229-1-cpaasch@apple.com> References: <20200122005633.21229-1-cpaasch@apple.com> MIME-version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2020-01-17_05:, , signatures=0 Message-ID-Hash: UCPZTBPV5MWFNEHYTKISFGE4MMBEFIZZ X-Message-ID-Hash: UCPZTBPV5MWFNEHYTKISFGE4MMBEFIZZ X-MailFrom: cpaasch@apple.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: mptcp@lists.01.org X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH net-next v3 10/19] mptcp: Implement MPTCP receive path List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: From: Mat Martineau Parses incoming DSS options and populates outgoing MPTCP ACK fields. MPTCP fields are parsed from the TCP option header and placed in an skb extension, allowing the upper MPTCP layer to access MPTCP options after the skb has gone through the TCP stack. The subflow implements its own data_ready() ops, which ensures that the pending data is in sequence - according to MPTCP seq number - dropping out-of-seq skbs. The DATA_READY bit flag is set if this is the case. This allows the MPTCP socket layer to determine if more data is available without having to consult the individual subflows. It additionally validates the current mapping and propagates EoF events to the connection socket. Co-developed-by: Paolo Abeni Signed-off-by: Paolo Abeni Co-developed-by: Peter Krystad Signed-off-by: Peter Krystad Co-developed-by: Davide Caratti Signed-off-by: Davide Caratti Co-developed-by: Matthieu Baerts Signed-off-by: Matthieu Baerts Co-developed-by: Florian Westphal Signed-off-by: Florian Westphal Signed-off-by: Mat Martineau Signed-off-by: Christoph Paasch --- include/linux/tcp.h | 10 ++ include/net/mptcp.h | 8 + net/ipv4/tcp_input.c | 8 +- net/mptcp/options.c | 107 ++++++++++++ net/mptcp/protocol.c | 34 ++++ net/mptcp/protocol.h | 64 +++++++- net/mptcp/subflow.c | 383 ++++++++++++++++++++++++++++++++++++++++++- 7 files changed, 609 insertions(+), 5 deletions(-) diff --git a/include/linux/tcp.h b/include/linux/tcp.h index e9ee06d887fa..0d00dad4b85d 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -82,9 +82,19 @@ struct tcp_sack_block { struct mptcp_options_received { u64 sndr_key; u64 rcvr_key; + u64 data_ack; + u64 data_seq; + u32 subflow_seq; + u16 data_len; u8 mp_capable : 1, mp_join : 1, dss : 1; + u8 use_map:1, + dsn64:1, + data_fin:1, + use_ack:1, + ack64:1, + __unused:3; }; #endif diff --git a/include/net/mptcp.h b/include/net/mptcp.h index 06dcc665135e..8619c1fca741 100644 --- a/include/net/mptcp.h +++ b/include/net/mptcp.h @@ -60,6 +60,8 @@ bool mptcp_synack_options(const struct request_sock *req, unsigned int *size, bool mptcp_established_options(struct sock *sk, struct sk_buff *skb, unsigned int *size, unsigned int remaining, struct mptcp_out_options *opts); +void mptcp_incoming_options(struct sock *sk, struct sk_buff *skb, + struct tcp_options_received *opt_rx); void mptcp_write_options(__be32 *ptr, struct mptcp_out_options *opts); @@ -150,6 +152,12 @@ static inline bool mptcp_established_options(struct sock *sk, return false; } +static inline void mptcp_incoming_options(struct sock *sk, + struct sk_buff *skb, + struct tcp_options_received *opt_rx) +{ +} + static inline void mptcp_skb_ext_move(struct sk_buff *to, const struct sk_buff *from) { diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 5165c8de47ee..28d31f2c1422 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4770,6 +4770,9 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb) bool fragstolen; int eaten; + if (sk_is_mptcp(sk)) + mptcp_incoming_options(sk, skb, &tp->rx_opt); + if (TCP_SKB_CB(skb)->seq == TCP_SKB_CB(skb)->end_seq) { __kfree_skb(skb); return; @@ -6346,8 +6349,11 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb) case TCP_CLOSE_WAIT: case TCP_CLOSING: case TCP_LAST_ACK: - if (!before(TCP_SKB_CB(skb)->seq, tp->rcv_nxt)) + if (!before(TCP_SKB_CB(skb)->seq, tp->rcv_nxt)) { + if (sk_is_mptcp(sk)) + mptcp_incoming_options(sk, skb, &tp->rx_opt); break; + } /* fall through */ case TCP_FIN_WAIT1: case TCP_FIN_WAIT2: diff --git a/net/mptcp/options.c b/net/mptcp/options.c index 5ea127bc7f01..1fa8496f3551 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -14,6 +14,7 @@ void mptcp_parse_option(const unsigned char *ptr, int opsize, { struct mptcp_options_received *mp_opt = &opt_rx->mptcp; u8 subtype = *ptr >> 4; + int expected_opsize; u8 version; u8 flags; @@ -64,7 +65,79 @@ void mptcp_parse_option(const unsigned char *ptr, int opsize, case MPTCPOPT_DSS: pr_debug("DSS"); + ptr++; + + flags = (*ptr++) & MPTCP_DSS_FLAG_MASK; + mp_opt->data_fin = (flags & MPTCP_DSS_DATA_FIN) != 0; + mp_opt->dsn64 = (flags & MPTCP_DSS_DSN64) != 0; + mp_opt->use_map = (flags & MPTCP_DSS_HAS_MAP) != 0; + mp_opt->ack64 = (flags & MPTCP_DSS_ACK64) != 0; + mp_opt->use_ack = (flags & MPTCP_DSS_HAS_ACK); + + pr_debug("data_fin=%d dsn64=%d use_map=%d ack64=%d use_ack=%d", + mp_opt->data_fin, mp_opt->dsn64, + mp_opt->use_map, mp_opt->ack64, + mp_opt->use_ack); + + expected_opsize = TCPOLEN_MPTCP_DSS_BASE; + + if (mp_opt->use_ack) { + if (mp_opt->ack64) + expected_opsize += TCPOLEN_MPTCP_DSS_ACK64; + else + expected_opsize += TCPOLEN_MPTCP_DSS_ACK32; + } + + if (mp_opt->use_map) { + if (mp_opt->dsn64) + expected_opsize += TCPOLEN_MPTCP_DSS_MAP64; + else + expected_opsize += TCPOLEN_MPTCP_DSS_MAP32; + } + + /* RFC 6824, Section 3.3: + * If a checksum is present, but its use had + * not been negotiated in the MP_CAPABLE handshake, + * the checksum field MUST be ignored. + */ + if (opsize != expected_opsize && + opsize != expected_opsize + TCPOLEN_MPTCP_DSS_CHECKSUM) + break; + mp_opt->dss = 1; + + if (mp_opt->use_ack) { + if (mp_opt->ack64) { + mp_opt->data_ack = get_unaligned_be64(ptr); + ptr += 8; + } else { + mp_opt->data_ack = get_unaligned_be32(ptr); + ptr += 4; + } + + pr_debug("data_ack=%llu", mp_opt->data_ack); + } + + if (mp_opt->use_map) { + if (mp_opt->dsn64) { + mp_opt->data_seq = get_unaligned_be64(ptr); + ptr += 8; + } else { + mp_opt->data_seq = get_unaligned_be32(ptr); + ptr += 4; + } + + mp_opt->subflow_seq = get_unaligned_be32(ptr); + ptr += 4; + + mp_opt->data_len = get_unaligned_be16(ptr); + ptr += 2; + + pr_debug("data_seq=%llu subflow_seq=%u data_len=%u", + mp_opt->data_seq, mp_opt->subflow_seq, + mp_opt->data_len); + } + break; default: @@ -275,6 +348,40 @@ bool mptcp_synack_options(const struct request_sock *req, unsigned int *size, return false; } +void mptcp_incoming_options(struct sock *sk, struct sk_buff *skb, + struct tcp_options_received *opt_rx) +{ + struct mptcp_options_received *mp_opt; + struct mptcp_ext *mpext; + + mp_opt = &opt_rx->mptcp; + + if (!mp_opt->dss) + return; + + mpext = skb_ext_add(skb, SKB_EXT_MPTCP); + if (!mpext) + return; + + memset(mpext, 0, sizeof(*mpext)); + + if (mp_opt->use_map) { + mpext->data_seq = mp_opt->data_seq; + mpext->subflow_seq = mp_opt->subflow_seq; + mpext->data_len = mp_opt->data_len; + mpext->use_map = 1; + mpext->dsn64 = mp_opt->dsn64; + } + + if (mp_opt->use_ack) { + mpext->data_ack = mp_opt->data_ack; + mpext->use_ack = 1; + mpext->ack64 = mp_opt->ack64; + } + + mpext->data_fin = mp_opt->data_fin; +} + void mptcp_write_options(__be32 *ptr, struct mptcp_out_options *opts) { if ((OPTION_MPTCP_MPC_SYN | diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 8cf49193b1c0..71250149180b 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -224,6 +224,33 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) return ret; } +int mptcp_read_actor(read_descriptor_t *desc, struct sk_buff *skb, + unsigned int offset, size_t len) +{ + struct mptcp_read_arg *arg = desc->arg.data; + size_t copy_len; + + copy_len = min(desc->count, len); + + if (likely(arg->msg)) { + int err; + + err = skb_copy_datagram_msg(skb, offset, arg->msg, copy_len); + if (err) { + pr_debug("error path"); + desc->error = err; + return err; + } + } else { + pr_debug("Flushing skb payload"); + } + + desc->count -= copy_len; + + pr_debug("consumed %zu bytes, %zu left", copy_len, desc->count); + return copy_len; +} + static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock, int flags, int *addr_len) { @@ -414,7 +441,10 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err, msk->write_seq = subflow->idsn + 1; ack_seq++; msk->ack_seq = ack_seq; + subflow->map_seq = ack_seq; + subflow->map_subflow_seq = 1; subflow->rel_write_seq = 1; + subflow->tcp_sock = ssk; newsk = new_mptcp_sock; mptcp_copy_inaddrs(newsk, ssk); list_add(&subflow->node, &msk->conn_list); @@ -519,8 +549,12 @@ void mptcp_finish_connect(struct sock *ssk) sk = subflow->conn; msk = mptcp_sk(sk); + pr_debug("msk=%p, token=%u", sk, subflow->token); + mptcp_crypto_key_sha(subflow->remote_key, NULL, &ack_seq); ack_seq++; + subflow->map_seq = ack_seq; + subflow->map_subflow_seq = 1; subflow->rel_write_seq = 1; /* the socket is not connected yet, no msk/subflow ops can access/race diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 384ec4804198..c6d8217e24d4 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -33,7 +33,9 @@ #define TCPOLEN_MPTCP_MPC_SYNACK 12 #define TCPOLEN_MPTCP_MPC_ACK 20 #define TCPOLEN_MPTCP_DSS_BASE 4 +#define TCPOLEN_MPTCP_DSS_ACK32 4 #define TCPOLEN_MPTCP_DSS_ACK64 8 +#define TCPOLEN_MPTCP_DSS_MAP32 10 #define TCPOLEN_MPTCP_DSS_MAP64 14 #define TCPOLEN_MPTCP_DSS_CHECKSUM 2 @@ -50,6 +52,10 @@ #define MPTCP_DSS_HAS_MAP BIT(2) #define MPTCP_DSS_ACK64 BIT(1) #define MPTCP_DSS_HAS_ACK BIT(0) +#define MPTCP_DSS_FLAG_MASK (0x1F) + +/* MPTCP socket flags */ +#define MPTCP_DATA_READY BIT(0) /* MPTCP connection sock */ struct mptcp_sock { @@ -60,6 +66,7 @@ struct mptcp_sock { u64 write_seq; u64 ack_seq; u32 token; + unsigned long flags; struct list_head conn_list; struct skb_ext *cached_ext; /* for the next sendmsg */ struct socket *subflow; /* outgoing connect/listener/!mp_capable */ @@ -82,6 +89,7 @@ struct mptcp_subflow_request_sock { u64 remote_key; u64 idsn; u32 token; + u32 ssn_offset; }; static inline struct mptcp_subflow_request_sock * @@ -96,15 +104,27 @@ struct mptcp_subflow_context { u64 local_key; u64 remote_key; u64 idsn; + u64 map_seq; u32 token; u32 rel_write_seq; + u32 map_subflow_seq; + u32 ssn_offset; + u32 map_data_len; u32 request_mptcp : 1, /* send MP_CAPABLE */ mp_capable : 1, /* remote is MPTCP capable */ fourth_ack : 1, /* send initial DSS */ - conn_finished : 1; + conn_finished : 1, + map_valid : 1, + data_avail : 1, + rx_eof : 1; + struct sock *tcp_sock; /* tcp sk backpointer */ struct sock *conn; /* parent mptcp_sock */ const struct inet_connection_sock_af_ops *icsk_af_ops; + void (*tcp_data_ready)(struct sock *sk); + void (*tcp_state_change)(struct sock *sk); + void (*tcp_write_space)(struct sock *sk); + struct rcu_head rcu; }; @@ -123,14 +143,49 @@ mptcp_subflow_tcp_sock(const struct mptcp_subflow_context *subflow) return subflow->tcp_sock; } +static inline u64 +mptcp_subflow_get_map_offset(const struct mptcp_subflow_context *subflow) +{ + return tcp_sk(mptcp_subflow_tcp_sock(subflow))->copied_seq - + subflow->ssn_offset - + subflow->map_subflow_seq; +} + +static inline u64 +mptcp_subflow_get_mapped_dsn(const struct mptcp_subflow_context *subflow) +{ + return subflow->map_seq + mptcp_subflow_get_map_offset(subflow); +} + +int mptcp_is_enabled(struct net *net); +bool mptcp_subflow_data_available(struct sock *sk); void mptcp_subflow_init(void); int mptcp_subflow_create_socket(struct sock *sk, struct socket **new_sock); +static inline void mptcp_subflow_tcp_fallback(struct sock *sk, + struct mptcp_subflow_context *ctx) +{ + sk->sk_data_ready = ctx->tcp_data_ready; + sk->sk_state_change = ctx->tcp_state_change; + sk->sk_write_space = ctx->tcp_write_space; + + inet_csk(sk)->icsk_af_ops = ctx->icsk_af_ops; +} + extern const struct inet_connection_sock_af_ops ipv4_specific; #if IS_ENABLED(CONFIG_MPTCP_IPV6) extern const struct inet_connection_sock_af_ops ipv6_specific; #endif +void mptcp_proto_init(void); + +struct mptcp_read_arg { + struct msghdr *msg; +}; + +int mptcp_read_actor(read_descriptor_t *desc, struct sk_buff *skb, + unsigned int offset, size_t len); + void mptcp_get_options(const struct sk_buff *skb, struct tcp_options_received *opt_rx); @@ -164,4 +219,11 @@ static inline struct mptcp_ext *mptcp_get_ext(struct sk_buff *skb) return (struct mptcp_ext *)skb_ext_find(skb, SKB_EXT_MPTCP); } +static inline bool before64(__u64 seq1, __u64 seq2) +{ + return (__s64)(seq1 - seq2) < 0; +} + +#define after64(seq2, seq1) before64(seq1, seq2) + #endif /* __MPTCP_PROTOCOL_H */ diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index 89b91bc7a831..528351e26371 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -78,6 +78,7 @@ static void subflow_init_req(struct request_sock *req, subflow_req->mp_capable = 1; subflow_req->remote_key = rx_opt.mptcp.sndr_key; + subflow_req->ssn_offset = TCP_SKB_CB(skb)->seq; } } @@ -116,6 +117,11 @@ static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb) subflow->remote_key); mptcp_finish_connect(sk); subflow->conn_finished = 1; + + if (skb) { + pr_debug("synack seq=%u", TCP_SKB_CB(skb)->seq); + subflow->ssn_offset = TCP_SKB_CB(skb)->seq; + } } } @@ -210,6 +216,323 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk, static struct inet_connection_sock_af_ops subflow_specific; +enum mapping_status { + MAPPING_OK, + MAPPING_INVALID, + MAPPING_EMPTY, + MAPPING_DATA_FIN +}; + +static u64 expand_seq(u64 old_seq, u16 old_data_len, u64 seq) +{ + if ((u32)seq == (u32)old_seq) + return old_seq; + + /* Assume map covers data not mapped yet. */ + return seq | ((old_seq + old_data_len + 1) & GENMASK_ULL(63, 32)); +} + +static void warn_bad_map(struct mptcp_subflow_context *subflow, u32 ssn) +{ + WARN_ONCE(1, "Bad mapping: ssn=%d map_seq=%d map_data_len=%d", + ssn, subflow->map_subflow_seq, subflow->map_data_len); +} + +static bool skb_is_fully_mapped(struct sock *ssk, struct sk_buff *skb) +{ + struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk); + unsigned int skb_consumed; + + skb_consumed = tcp_sk(ssk)->copied_seq - TCP_SKB_CB(skb)->seq; + if (WARN_ON_ONCE(skb_consumed >= skb->len)) + return true; + + return skb->len - skb_consumed <= subflow->map_data_len - + mptcp_subflow_get_map_offset(subflow); +} + +static bool validate_mapping(struct sock *ssk, struct sk_buff *skb) +{ + struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk); + u32 ssn = tcp_sk(ssk)->copied_seq - subflow->ssn_offset; + + if (unlikely(before(ssn, subflow->map_subflow_seq))) { + /* Mapping covers data later in the subflow stream, + * currently unsupported. + */ + warn_bad_map(subflow, ssn); + return false; + } + if (unlikely(!before(ssn, subflow->map_subflow_seq + + subflow->map_data_len))) { + /* Mapping does covers past subflow data, invalid */ + warn_bad_map(subflow, ssn + skb->len); + return false; + } + return true; +} + +static enum mapping_status get_mapping_status(struct sock *ssk) +{ + struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk); + struct mptcp_ext *mpext; + struct sk_buff *skb; + u16 data_len; + u64 map_seq; + + skb = skb_peek(&ssk->sk_receive_queue); + if (!skb) + return MAPPING_EMPTY; + + mpext = mptcp_get_ext(skb); + if (!mpext || !mpext->use_map) { + if (!subflow->map_valid && !skb->len) { + /* the TCP stack deliver 0 len FIN pkt to the receive + * queue, that is the only 0len pkts ever expected here, + * and we can admit no mapping only for 0 len pkts + */ + if (!(TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)) + WARN_ONCE(1, "0len seq %d:%d flags %x", + TCP_SKB_CB(skb)->seq, + TCP_SKB_CB(skb)->end_seq, + TCP_SKB_CB(skb)->tcp_flags); + sk_eat_skb(ssk, skb); + return MAPPING_EMPTY; + } + + if (!subflow->map_valid) + return MAPPING_INVALID; + + goto validate_seq; + } + + pr_debug("seq=%llu is64=%d ssn=%u data_len=%u data_fin=%d", + mpext->data_seq, mpext->dsn64, mpext->subflow_seq, + mpext->data_len, mpext->data_fin); + + data_len = mpext->data_len; + if (data_len == 0) { + pr_err("Infinite mapping not handled"); + return MAPPING_INVALID; + } + + if (mpext->data_fin == 1) { + if (data_len == 1) { + pr_debug("DATA_FIN with no payload"); + if (subflow->map_valid) { + /* A DATA_FIN might arrive in a DSS + * option before the previous mapping + * has been fully consumed. Continue + * handling the existing mapping. + */ + skb_ext_del(skb, SKB_EXT_MPTCP); + return MAPPING_OK; + } else { + return MAPPING_DATA_FIN; + } + } + + /* Adjust for DATA_FIN using 1 byte of sequence space */ + data_len--; + } + + if (!mpext->dsn64) { + map_seq = expand_seq(subflow->map_seq, subflow->map_data_len, + mpext->data_seq); + pr_debug("expanded seq=%llu", subflow->map_seq); + } else { + map_seq = mpext->data_seq; + } + + if (subflow->map_valid) { + /* Allow replacing only with an identical map */ + if (subflow->map_seq == map_seq && + subflow->map_subflow_seq == mpext->subflow_seq && + subflow->map_data_len == data_len) { + skb_ext_del(skb, SKB_EXT_MPTCP); + return MAPPING_OK; + } + + /* If this skb data are fully covered by the current mapping, + * the new map would need caching, which is not supported + */ + if (skb_is_fully_mapped(ssk, skb)) + return MAPPING_INVALID; + + /* will validate the next map after consuming the current one */ + return MAPPING_OK; + } + + subflow->map_seq = map_seq; + subflow->map_subflow_seq = mpext->subflow_seq; + subflow->map_data_len = data_len; + subflow->map_valid = 1; + pr_debug("new map seq=%llu subflow_seq=%u data_len=%u", + subflow->map_seq, subflow->map_subflow_seq, + subflow->map_data_len); + +validate_seq: + /* we revalidate valid mapping on new skb, because we must ensure + * the current skb is completely covered by the available mapping + */ + if (!validate_mapping(ssk, skb)) + return MAPPING_INVALID; + + skb_ext_del(skb, SKB_EXT_MPTCP); + return MAPPING_OK; +} + +static bool subflow_check_data_avail(struct sock *ssk) +{ + struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk); + enum mapping_status status; + struct mptcp_sock *msk; + struct sk_buff *skb; + + pr_debug("msk=%p ssk=%p data_avail=%d skb=%p", subflow->conn, ssk, + subflow->data_avail, skb_peek(&ssk->sk_receive_queue)); + if (subflow->data_avail) + return true; + + if (!subflow->conn) + return false; + + msk = mptcp_sk(subflow->conn); + for (;;) { + u32 map_remaining; + size_t delta; + u64 ack_seq; + u64 old_ack; + + status = get_mapping_status(ssk); + pr_debug("msk=%p ssk=%p status=%d", msk, ssk, status); + if (status == MAPPING_INVALID) { + ssk->sk_err = EBADMSG; + goto fatal; + } + + if (status != MAPPING_OK) + return false; + + skb = skb_peek(&ssk->sk_receive_queue); + if (WARN_ON_ONCE(!skb)) + return false; + + old_ack = READ_ONCE(msk->ack_seq); + ack_seq = mptcp_subflow_get_mapped_dsn(subflow); + pr_debug("msk ack_seq=%llx subflow ack_seq=%llx", old_ack, + ack_seq); + if (ack_seq == old_ack) + break; + + /* only accept in-sequence mapping. Old values are spurious + * retransmission; we can hit "future" values on active backup + * subflow switch, we relay on retransmissions to get + * in-sequence data. + * Cuncurrent subflows support will require subflow data + * reordering + */ + map_remaining = subflow->map_data_len - + mptcp_subflow_get_map_offset(subflow); + if (before64(ack_seq, old_ack)) + delta = min_t(size_t, old_ack - ack_seq, map_remaining); + else + delta = min_t(size_t, ack_seq - old_ack, map_remaining); + + /* discard mapped data */ + pr_debug("discarding %zu bytes, current map len=%d", delta, + map_remaining); + if (delta) { + struct mptcp_read_arg arg = { + .msg = NULL, + }; + read_descriptor_t desc = { + .count = delta, + .arg.data = &arg, + }; + int ret; + + ret = tcp_read_sock(ssk, &desc, mptcp_read_actor); + if (ret < 0) { + ssk->sk_err = -ret; + goto fatal; + } + if (ret < delta) + return false; + if (delta == map_remaining) + subflow->map_valid = 0; + } + } + return true; + +fatal: + /* fatal protocol error, close the socket */ + /* This barrier is coupled with smp_rmb() in tcp_poll() */ + smp_wmb(); + ssk->sk_error_report(ssk); + tcp_set_state(ssk, TCP_CLOSE); + tcp_send_active_reset(ssk, GFP_ATOMIC); + return false; +} + +bool mptcp_subflow_data_available(struct sock *sk) +{ + struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk); + struct sk_buff *skb; + + /* check if current mapping is still valid */ + if (subflow->map_valid && + mptcp_subflow_get_map_offset(subflow) >= subflow->map_data_len) { + subflow->map_valid = 0; + subflow->data_avail = 0; + + pr_debug("Done with mapping: seq=%u data_len=%u", + subflow->map_subflow_seq, + subflow->map_data_len); + } + + if (!subflow_check_data_avail(sk)) { + subflow->data_avail = 0; + return false; + } + + skb = skb_peek(&sk->sk_receive_queue); + subflow->data_avail = skb && + before(tcp_sk(sk)->copied_seq, TCP_SKB_CB(skb)->end_seq); + return subflow->data_avail; +} + +static void subflow_data_ready(struct sock *sk) +{ + struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk); + struct sock *parent = subflow->conn; + + if (!parent || !subflow->mp_capable) { + subflow->tcp_data_ready(sk); + + if (parent) + parent->sk_data_ready(parent); + return; + } + + if (mptcp_subflow_data_available(sk)) { + set_bit(MPTCP_DATA_READY, &mptcp_sk(parent)->flags); + + parent->sk_data_ready(parent); + } +} + +static void subflow_write_space(struct sock *sk) +{ + struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk); + struct sock *parent = subflow->conn; + + sk_stream_write_space(sk); + if (parent && sk_stream_is_writeable(sk)) { + sk_stream_write_space(parent); + } +} + static struct inet_connection_sock_af_ops * subflow_default_af_ops(struct sock *sk) { @@ -296,6 +619,47 @@ static struct mptcp_subflow_context *subflow_create_ctx(struct sock *sk, return ctx; } +static void __subflow_state_change(struct sock *sk) +{ + struct socket_wq *wq; + + rcu_read_lock(); + wq = rcu_dereference(sk->sk_wq); + if (skwq_has_sleeper(wq)) + wake_up_interruptible_all(&wq->wait); + rcu_read_unlock(); +} + +static bool subflow_is_done(const struct sock *sk) +{ + return sk->sk_shutdown & RCV_SHUTDOWN || sk->sk_state == TCP_CLOSE; +} + +static void subflow_state_change(struct sock *sk) +{ + struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk); + struct sock *parent = READ_ONCE(subflow->conn); + + __subflow_state_change(sk); + + /* as recvmsg() does not acquire the subflow socket for ssk selection + * a fin packet carrying a DSS can be unnoticed if we don't trigger + * the data available machinery here. + */ + if (parent && subflow->mp_capable && mptcp_subflow_data_available(sk)) { + set_bit(MPTCP_DATA_READY, &mptcp_sk(parent)->flags); + + parent->sk_data_ready(parent); + } + + if (parent && !(parent->sk_shutdown & RCV_SHUTDOWN) && + !subflow->rx_eof && subflow_is_done(sk)) { + subflow->rx_eof = 1; + parent->sk_shutdown |= RCV_SHUTDOWN; + __subflow_state_change(parent); + } +} + static int subflow_ulp_init(struct sock *sk) { struct inet_connection_sock *icsk = inet_csk(sk); @@ -322,6 +686,12 @@ static int subflow_ulp_init(struct sock *sk) tp->is_mptcp = 1; ctx->icsk_af_ops = icsk->icsk_af_ops; icsk->icsk_af_ops = subflow_default_af_ops(sk); + ctx->tcp_data_ready = sk->sk_data_ready; + ctx->tcp_state_change = sk->sk_state_change; + ctx->tcp_write_space = sk->sk_write_space; + sk->sk_data_ready = subflow_data_ready; + sk->sk_write_space = subflow_write_space; + sk->sk_state_change = subflow_state_change; out: return err; } @@ -339,10 +709,12 @@ static void subflow_ulp_release(struct sock *sk) kfree_rcu(ctx, rcu); } -static void subflow_ulp_fallback(struct sock *sk) +static void subflow_ulp_fallback(struct sock *sk, + struct mptcp_subflow_context *old_ctx) { struct inet_connection_sock *icsk = inet_csk(sk); + mptcp_subflow_tcp_fallback(sk, old_ctx); icsk->icsk_ulp_ops = NULL; rcu_assign_pointer(icsk->icsk_ulp_data, NULL); tcp_sk(sk)->is_mptcp = 0; @@ -357,23 +729,28 @@ static void subflow_ulp_clone(const struct request_sock *req, struct mptcp_subflow_context *new_ctx; if (!subflow_req->mp_capable) { - subflow_ulp_fallback(newsk); + subflow_ulp_fallback(newsk, old_ctx); return; } new_ctx = subflow_create_ctx(newsk, priority); if (new_ctx == NULL) { - subflow_ulp_fallback(newsk); + subflow_ulp_fallback(newsk, old_ctx); return; } new_ctx->conn_finished = 1; new_ctx->icsk_af_ops = old_ctx->icsk_af_ops; + new_ctx->tcp_data_ready = old_ctx->tcp_data_ready; + new_ctx->tcp_state_change = old_ctx->tcp_state_change; + new_ctx->tcp_write_space = old_ctx->tcp_write_space; new_ctx->mp_capable = 1; new_ctx->fourth_ack = 1; new_ctx->remote_key = subflow_req->remote_key; new_ctx->local_key = subflow_req->local_key; new_ctx->token = subflow_req->token; + new_ctx->ssn_offset = subflow_req->ssn_offset; + new_ctx->idsn = subflow_req->idsn; } static struct tcp_ulp_ops subflow_ulp_ops __read_mostly = { From patchwork Wed Jan 22 00:56:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Paasch X-Patchwork-Id: 1226865 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=2001:19d0:306:5::1; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=quarantine dis=none) header.from=apple.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=apple.com header.i=@apple.com header.a=rsa-sha256 header.s=20180706 header.b=a/QvTUbt; dkim-atps=neutral Received: from ml01.01.org (ml01.01.org [IPv6:2001:19d0:306:5::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 482Rnl5GgLz9sRk for ; Wed, 22 Jan 2020 11:57:15 +1100 (AEDT) Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 911ED1007B8F0; Tue, 21 Jan 2020 17:00:32 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=17.171.2.72; helo=ma1-aaemail-dr-lapp03.apple.com; envelope-from=cpaasch@apple.com; receiver= Received: from ma1-aaemail-dr-lapp03.apple.com (ma1-aaemail-dr-lapp03.apple.com [17.171.2.72]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 358E21007B8E1 for ; Tue, 21 Jan 2020 17:00:30 -0800 (PST) Received: from pps.filterd (ma1-aaemail-dr-lapp03.apple.com [127.0.0.1]) by ma1-aaemail-dr-lapp03.apple.com (8.16.0.27/8.16.0.27) with SMTP id 00M0uwKK005654; Tue, 21 Jan 2020 16:57:09 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=apple.com; h=sender : from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=20180706; bh=9DpUkEE1hQqMvpowGbm9a1ANRZcsZC2me7f7WEkKvHA=; b=a/QvTUbtr874c+U53ccyvGTiqCWy+aCxlrkt/+WHGafVAVL5oIQA7gaq7VgOwX/1EIgW 6QZHt9MVPERQmRwnfWk4yojFgSRJ0RDbTS/vlLLWPCSW/p/vEf15i72BdRR/GsyQluBZ WN/WeS/LKWmTK1eElxtbUgFbYmK+2iu/DHwM5SQ/shROGuM6dguFGrZoNptjkExF8Y6D 2bM+Gi1J3YKOpxVs+C491Jv/f8WzIqbQkfqF5fJmSWnMoL18kIZQAnI8KA6aGVprCXf4 0mHoZXDzh4gs+Dt1rjAh7QmzP+qKC5gRHhJ7GPYQnSogGjaNiVWxhMvruQdxvXPtEfzv Jw== Received: from ma1-mtap-s02.corp.apple.com (ma1-mtap-s02.corp.apple.com [17.40.76.6]) by ma1-aaemail-dr-lapp03.apple.com with ESMTP id 2xm1w0q03h-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Tue, 21 Jan 2020 16:57:09 -0800 Received: from nwk-mmpp-sz13.apple.com (nwk-mmpp-sz13.apple.com [17.128.115.216]) by ma1-mtap-s02.corp.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPS id <0Q4H0018HHB4F030@ma1-mtap-s02.corp.apple.com>; Tue, 21 Jan 2020 16:57:04 -0800 (PST) Received: from process_milters-daemon.nwk-mmpp-sz13.apple.com by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) id <0Q4H00500GU2WL00@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:57:04 -0800 (PST) X-Va-A: X-Va-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-Va-E-CD: a2fb2f7069aca739eb7b193888fb8781 X-Va-R-CD: 28d5f2ab3b4ba4452e361aa80a15662e X-Va-CD: 0 X-Va-ID: 2b96e8d1-09a9-4ba9-8f74-78dcfe1b64df X-V-A: X-V-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-V-E-CD: a2fb2f7069aca739eb7b193888fb8781 X-V-R-CD: 28d5f2ab3b4ba4452e361aa80a15662e X-V-CD: 0 X-V-ID: 98c092a9-5d96-42b8-a0d7-346ad2619ca9 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2020-01-17_05:,, signatures=0 Received: from localhost ([17.192.155.241]) by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPSA id <0Q4H00127HB04Y50@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:57:00 -0800 (PST) Sender: cpaasch@apple.com From: Christoph Paasch To: netdev@vger.kernel.org Date: Tue, 21 Jan 2020 16:56:25 -0800 Message-id: <20200122005633.21229-12-cpaasch@apple.com> X-Mailer: git-send-email 2.23.0 In-reply-to: <20200122005633.21229-1-cpaasch@apple.com> References: <20200122005633.21229-1-cpaasch@apple.com> MIME-version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2020-01-17_05:, , signatures=0 Message-ID-Hash: AYS43E4EB7CCTEBN2WQEHO3PTIWL2EYG X-Message-ID-Hash: AYS43E4EB7CCTEBN2WQEHO3PTIWL2EYG X-MailFrom: cpaasch@apple.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: mptcp@lists.01.org X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH net-next v3 11/19] mptcp: add subflow write space signalling and mptcp_poll List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: From: Florian Westphal Add new SEND_SPACE flag to indicate that a subflow has enough space to accept more data for transmission. It gets cleared at the end of mptcp_sendmsg() in case ssk has run below the free watermark. It is (re-set) from the wspace callback. This allows us to use msk->flags to determine the poll mask. Co-developed-by: Peter Krystad Signed-off-by: Peter Krystad Signed-off-by: Florian Westphal Signed-off-by: Christoph Paasch --- net/mptcp/protocol.c | 53 ++++++++++++++++++++++++++++++++++++++++++++ net/mptcp/protocol.h | 1 + net/mptcp/subflow.c | 3 +++ 3 files changed, 57 insertions(+) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 71250149180b..408efbe34753 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -176,6 +176,23 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk, return ret; } +static void ssk_check_wmem(struct mptcp_sock *msk, struct sock *ssk) +{ + struct socket *sock; + + if (likely(sk_stream_is_writeable(ssk))) + return; + + sock = READ_ONCE(ssk->sk_socket); + + if (sock) { + clear_bit(MPTCP_SEND_SPACE, &msk->flags); + smp_mb__after_atomic(); + /* set NOSPACE only after clearing SEND_SPACE flag */ + set_bit(SOCK_NOSPACE, &sock->flags); + } +} + static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) { struct mptcp_sock *msk = mptcp_sk(sk); @@ -219,6 +236,7 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) if (copied > 0) ret = copied; + ssk_check_wmem(msk, ssk); release_sock(ssk); release_sock(sk); return ret; @@ -315,6 +333,7 @@ static int mptcp_init_sock(struct sock *sk) struct mptcp_sock *msk = mptcp_sk(sk); INIT_LIST_HEAD(&msk->conn_list); + __set_bit(MPTCP_SEND_SPACE, &msk->flags); return 0; } @@ -576,6 +595,13 @@ static void mptcp_sock_graft(struct sock *sk, struct socket *parent) write_unlock_bh(&sk->sk_callback_lock); } +static bool mptcp_memory_free(const struct sock *sk, int wake) +{ + struct mptcp_sock *msk = mptcp_sk(sk); + + return wake ? test_bit(MPTCP_SEND_SPACE, &msk->flags) : true; +} + static struct proto mptcp_prot = { .name = "MPTCP", .owner = THIS_MODULE, @@ -591,6 +617,7 @@ static struct proto mptcp_prot = { .hash = inet_hash, .unhash = inet_unhash, .get_port = mptcp_get_port, + .stream_memory_free = mptcp_memory_free, .obj_size = sizeof(struct mptcp_sock), .no_autobind = true, }; @@ -767,8 +794,34 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock, static __poll_t mptcp_poll(struct file *file, struct socket *sock, struct poll_table_struct *wait) { + const struct mptcp_sock *msk; + struct sock *sk = sock->sk; + struct socket *ssock; __poll_t mask = 0; + msk = mptcp_sk(sk); + lock_sock(sk); + ssock = __mptcp_nmpc_socket(msk); + if (ssock) { + mask = ssock->ops->poll(file, ssock, wait); + release_sock(sk); + return mask; + } + + release_sock(sk); + sock_poll_wait(file, sock, wait); + lock_sock(sk); + + if (test_bit(MPTCP_DATA_READY, &msk->flags)) + mask = EPOLLIN | EPOLLRDNORM; + if (sk_stream_is_writeable(sk) && + test_bit(MPTCP_SEND_SPACE, &msk->flags)) + mask |= EPOLLOUT | EPOLLWRNORM; + if (sk->sk_shutdown & RCV_SHUTDOWN) + mask |= EPOLLIN | EPOLLRDNORM | EPOLLRDHUP; + + release_sock(sk); + return mask; } diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index c6d8217e24d4..59a83eb64d37 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -56,6 +56,7 @@ /* MPTCP socket flags */ #define MPTCP_DATA_READY BIT(0) +#define MPTCP_SEND_SPACE BIT(1) /* MPTCP connection sock */ struct mptcp_sock { diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index 528351e26371..9fb3eb87a20f 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -529,6 +529,9 @@ static void subflow_write_space(struct sock *sk) sk_stream_write_space(sk); if (parent && sk_stream_is_writeable(sk)) { + set_bit(MPTCP_SEND_SPACE, &mptcp_sk(parent)->flags); + smp_mb__after_atomic(); + /* set SEND_SPACE before sk_stream_write_space clears NOSPACE */ sk_stream_write_space(parent); } } From patchwork Wed Jan 22 00:56:26 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Paasch X-Patchwork-Id: 1226860 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=2001:19d0:306:5::1; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=quarantine dis=none) header.from=apple.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=apple.com header.i=@apple.com header.a=rsa-sha256 header.s=20180706 header.b=De5+OqOM; dkim-atps=neutral Received: from ml01.01.org (ml01.01.org [IPv6:2001:19d0:306:5::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 482Rnf6hCDz9sRp for ; Wed, 22 Jan 2020 11:57:10 +1100 (AEDT) Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 4A1501007B8F8; Tue, 21 Jan 2020 17:00:27 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=17.151.62.68; helo=nwk-aaemail-lapp03.apple.com; envelope-from=cpaasch@apple.com; receiver= Received: from nwk-aaemail-lapp03.apple.com (nwk-aaemail-lapp03.apple.com [17.151.62.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id DEAE81007B8E1 for ; Tue, 21 Jan 2020 17:00:24 -0800 (PST) Received: from pps.filterd (nwk-aaemail-lapp03.apple.com [127.0.0.1]) by nwk-aaemail-lapp03.apple.com (8.16.0.27/8.16.0.27) with SMTP id 00M0urnW013197; Tue, 21 Jan 2020 16:57:05 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=apple.com; h=sender : from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=20180706; bh=FF49Fb9RtWXZBsApzlxTuyZuf2p3VUaT3kxHzi0eDgE=; b=De5+OqOM+5xjiZ0JVE6IwulVnSKr/zcPnGCyxygYACkjPLfI4q44Olwyvt/3LbXPDIrS MJHzajdZ+Bql1ywUIIKDzXGXq8uqdA2aQuU5iRTMeqcvHn6dRyySQF9qPyQtiTBDidpB 0Xxc5tXm9W57xFtATQff9AiTePWt8pULYTY2a0uiIa6Y48WqwKgBggoZG52atW4nGZWy ECNRUPvcvypPnQ2l42mttba8nb53Ui98grJqsV1+MGeTC0DG/5uBB81wSgPpElcUmF2i WKxICRQEZ8iflwREKw2erQPjxlcmS4CRilEmd3hjVvVEzV4TTX3j2JnW6salQDnVgOSw Tw== Received: from ma1-mtap-s01.corp.apple.com (ma1-mtap-s01.corp.apple.com [17.40.76.5]) by nwk-aaemail-lapp03.apple.com with ESMTP id 2xmk4p1699-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Tue, 21 Jan 2020 16:57:05 -0800 Received: from nwk-mmpp-sz13.apple.com (nwk-mmpp-sz13.apple.com [17.128.115.216]) by ma1-mtap-s01.corp.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPS id <0Q4H00HWQHAUJ720@ma1-mtap-s01.corp.apple.com>; Tue, 21 Jan 2020 16:57:02 -0800 (PST) Received: from process_milters-daemon.nwk-mmpp-sz13.apple.com by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) id <0Q4H00F00F5G4K00@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:57:02 -0800 (PST) X-Va-A: X-Va-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-Va-E-CD: 61e38c1bbc7fdb1247ea00964b0bc440 X-Va-R-CD: 3afddc4f4a9cd05e809fcf9591776af4 X-Va-CD: 0 X-Va-ID: b4a56e74-af19-4d30-9986-16dc7a389bdf X-V-A: X-V-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-V-E-CD: 61e38c1bbc7fdb1247ea00964b0bc440 X-V-R-CD: 3afddc4f4a9cd05e809fcf9591776af4 X-V-CD: 0 X-V-ID: 77619e81-b082-4c5f-a6a1-831e82c3bfaf X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2020-01-17_05:,, signatures=0 Received: from localhost ([17.192.155.241]) by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPSA id <0Q4H00DSOHB0DC30@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:57:00 -0800 (PST) Sender: cpaasch@apple.com From: Christoph Paasch To: netdev@vger.kernel.org Date: Tue, 21 Jan 2020 16:56:26 -0800 Message-id: <20200122005633.21229-13-cpaasch@apple.com> X-Mailer: git-send-email 2.23.0 In-reply-to: <20200122005633.21229-1-cpaasch@apple.com> References: <20200122005633.21229-1-cpaasch@apple.com> MIME-version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2020-01-17_05:, , signatures=0 Message-ID-Hash: IELXASN5OC3IKOGX7P2WUTOAWSW4S2QU X-Message-ID-Hash: IELXASN5OC3IKOGX7P2WUTOAWSW4S2QU X-MailFrom: cpaasch@apple.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: mptcp@lists.01.org X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH net-next v3 12/19] mptcp: recvmsg() can drain data from multiple subflows List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: From: Paolo Abeni With the previous patch in place, the msk can detect which subflow has the current map with a simple walk, let's update the main loop to always select the 'current' subflow. The exit conditions now closely mirror tcp_recvmsg() to get expected timeout and signal behavior. Co-developed-by: Peter Krystad Signed-off-by: Peter Krystad Co-developed-by: Davide Caratti Signed-off-by: Davide Caratti Co-developed-by: Matthieu Baerts Signed-off-by: Matthieu Baerts Co-developed-by: Mat Martineau Signed-off-by: Mat Martineau Co-developed-by: Florian Westphal Signed-off-by: Florian Westphal Signed-off-by: Paolo Abeni Signed-off-by: Christoph Paasch --- net/mptcp/protocol.c | 178 ++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 168 insertions(+), 10 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 408efbe34753..ad9c73cc20e1 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -9,6 +9,8 @@ #include #include #include +#include +#include #include #include #include @@ -105,6 +107,21 @@ static bool mptcp_ext_cache_refill(struct mptcp_sock *msk) return !!msk->cached_ext; } +static struct sock *mptcp_subflow_recv_lookup(const struct mptcp_sock *msk) +{ + struct mptcp_subflow_context *subflow; + struct sock *sk = (struct sock *)msk; + + sock_owned_by_me(sk); + + mptcp_for_each_subflow(msk, subflow) { + if (subflow->data_avail) + return mptcp_subflow_tcp_sock(subflow); + } + + return NULL; +} + static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk, struct msghdr *msg, long *timeo) { @@ -269,13 +286,37 @@ int mptcp_read_actor(read_descriptor_t *desc, struct sk_buff *skb, return copy_len; } +static void mptcp_wait_data(struct sock *sk, long *timeo) +{ + DEFINE_WAIT_FUNC(wait, woken_wake_function); + struct mptcp_sock *msk = mptcp_sk(sk); + + add_wait_queue(sk_sleep(sk), &wait); + sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk); + + sk_wait_event(sk, timeo, + test_and_clear_bit(MPTCP_DATA_READY, &msk->flags), &wait); + + sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk); + remove_wait_queue(sk_sleep(sk), &wait); +} + static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock, int flags, int *addr_len) { struct mptcp_sock *msk = mptcp_sk(sk); + struct mptcp_subflow_context *subflow; + bool more_data_avail = false; + struct mptcp_read_arg arg; + read_descriptor_t desc; + bool wait_data = false; struct socket *ssock; + struct tcp_sock *tp; + bool done = false; struct sock *ssk; int copied = 0; + int target; + long timeo; if (msg->msg_flags & ~(MSG_WAITALL | MSG_DONTWAIT)) return -EOPNOTSUPP; @@ -290,16 +331,124 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, return copied; } - ssk = mptcp_subflow_get(msk); - if (!ssk) { - release_sock(sk); - return -ENOTCONN; + arg.msg = msg; + desc.arg.data = &arg; + desc.error = 0; + + timeo = sock_rcvtimeo(sk, nonblock); + + len = min_t(size_t, len, INT_MAX); + target = sock_rcvlowat(sk, flags & MSG_WAITALL, len); + + while (!done) { + u32 map_remaining; + int bytes_read; + + ssk = mptcp_subflow_recv_lookup(msk); + pr_debug("msk=%p ssk=%p", msk, ssk); + if (!ssk) + goto wait_for_data; + + subflow = mptcp_subflow_ctx(ssk); + tp = tcp_sk(ssk); + + lock_sock(ssk); + do { + /* try to read as much data as available */ + map_remaining = subflow->map_data_len - + mptcp_subflow_get_map_offset(subflow); + desc.count = min_t(size_t, len - copied, map_remaining); + pr_debug("reading %zu bytes, copied %d", desc.count, + copied); + bytes_read = tcp_read_sock(ssk, &desc, + mptcp_read_actor); + if (bytes_read < 0) { + if (!copied) + copied = bytes_read; + done = true; + goto next; + } + + pr_debug("msk ack_seq=%llx -> %llx", msk->ack_seq, + msk->ack_seq + bytes_read); + msk->ack_seq += bytes_read; + copied += bytes_read; + if (copied >= len) { + done = true; + goto next; + } + if (tp->urg_data && tp->urg_seq == tp->copied_seq) { + pr_err("Urgent data present, cannot proceed"); + done = true; + goto next; + } +next: + more_data_avail = mptcp_subflow_data_available(ssk); + } while (more_data_avail && !done); + release_sock(ssk); + continue; + +wait_for_data: + more_data_avail = false; + + /* only the master socket status is relevant here. The exit + * conditions mirror closely tcp_recvmsg() + */ + if (copied >= target) + break; + + if (copied) { + if (sk->sk_err || + sk->sk_state == TCP_CLOSE || + (sk->sk_shutdown & RCV_SHUTDOWN) || + !timeo || + signal_pending(current)) + break; + } else { + if (sk->sk_err) { + copied = sock_error(sk); + break; + } + + if (sk->sk_shutdown & RCV_SHUTDOWN) + break; + + if (sk->sk_state == TCP_CLOSE) { + copied = -ENOTCONN; + break; + } + + if (!timeo) { + copied = -EAGAIN; + break; + } + + if (signal_pending(current)) { + copied = sock_intr_errno(timeo); + break; + } + } + + pr_debug("block timeout %ld", timeo); + wait_data = true; + mptcp_wait_data(sk, &timeo); } - copied = sock_recvmsg(ssk->sk_socket, msg, flags); + if (more_data_avail) { + if (!test_bit(MPTCP_DATA_READY, &msk->flags)) + set_bit(MPTCP_DATA_READY, &msk->flags); + } else if (!wait_data) { + clear_bit(MPTCP_DATA_READY, &msk->flags); - release_sock(sk); + /* .. race-breaker: ssk might get new data after last + * data_available() returns false. + */ + ssk = mptcp_subflow_recv_lookup(msk); + if (unlikely(ssk)) + set_bit(MPTCP_DATA_READY, &msk->flags); + } + release_sock(sk); return copied; } @@ -460,10 +609,6 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err, msk->write_seq = subflow->idsn + 1; ack_seq++; msk->ack_seq = ack_seq; - subflow->map_seq = ack_seq; - subflow->map_subflow_seq = 1; - subflow->rel_write_seq = 1; - subflow->tcp_sock = ssk; newsk = new_mptcp_sock; mptcp_copy_inaddrs(newsk, ssk); list_add(&subflow->node, &msk->conn_list); @@ -475,6 +620,19 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err, bh_unlock_sock(new_mptcp_sock); local_bh_enable(); release_sock(sk); + + /* the subflow can already receive packet, avoid racing with + * the receive path and process the pending ones + */ + lock_sock(ssk); + subflow->map_seq = ack_seq; + subflow->map_subflow_seq = 1; + subflow->rel_write_seq = 1; + subflow->tcp_sock = ssk; + subflow->conn = new_mptcp_sock; + if (unlikely(!skb_queue_empty(&ssk->sk_receive_queue))) + mptcp_subflow_data_available(ssk); + release_sock(ssk); } return newsk; From patchwork Wed Jan 22 00:56:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Paasch X-Patchwork-Id: 1226854 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=2001:19d0:306:5::1; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=quarantine dis=none) header.from=apple.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=apple.com header.i=@apple.com header.a=rsa-sha256 header.s=20180706 header.b=A0+MlTIe; dkim-atps=neutral Received: from ml01.01.org (ml01.01.org [IPv6:2001:19d0:306:5::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 482Rnc2YPbz9sRl for ; Wed, 22 Jan 2020 11:57:08 +1100 (AEDT) Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 000921007B8E1; Tue, 21 Jan 2020 17:00:24 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=17.151.62.67; helo=nwk-aaemail-lapp02.apple.com; envelope-from=cpaasch@apple.com; receiver= Received: from nwk-aaemail-lapp02.apple.com (nwk-aaemail-lapp02.apple.com [17.151.62.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id DF7BF1007B8C8 for ; Tue, 21 Jan 2020 17:00:22 -0800 (PST) Received: from pps.filterd (nwk-aaemail-lapp02.apple.com [127.0.0.1]) by nwk-aaemail-lapp02.apple.com (8.16.0.27/8.16.0.27) with SMTP id 00M0v2eJ020009; Tue, 21 Jan 2020 16:57:04 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=apple.com; h=sender : from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=20180706; bh=nqG6dk68pEmOIYmqkjlhBAXolRpLFOB8dqAyqxYZrIU=; b=A0+MlTIeU3bgRQFqEyyLBdsN3dI5nate1ZvN480fIfSmpoIGoRorfCf/8hwvLcjcryL/ Kb3XfOb33V2HNkoOblXCdGEJiJ1hClpJ/kll4UUW9bhSjD+alMaEIHghJvZq2xnmJbKg hEymzbZB9n/55MElwgVd4tYFGVwTc+pM9+dk7BMgzGyHAxkDf2ObX6mlRwRnuPzb7iXV cdcfT26RDmKMQJPN0XbvXIirxxpWTQQ5E9CYTZ9ZPVUBdjlPmtQjWRpOT9bLShsHT9wD 8hmVhFTbGtFOn8LQTZAlc7rcGMBP1sDMxACSt83CGX7m71MYQ3QP+bCJ/UZioK7gEWL6 og== Received: from ma1-mtap-s03.corp.apple.com (ma1-mtap-s03.corp.apple.com [17.40.76.7]) by nwk-aaemail-lapp02.apple.com with ESMTP id 2xkyfq8e4a-6 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Tue, 21 Jan 2020 16:57:04 -0800 Received: from nwk-mmpp-sz13.apple.com (nwk-mmpp-sz13.apple.com [17.128.115.216]) by ma1-mtap-s03.corp.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPS id <0Q4H00064HAZD600@ma1-mtap-s03.corp.apple.com>; Tue, 21 Jan 2020 16:57:02 -0800 (PST) Received: from process_milters-daemon.nwk-mmpp-sz13.apple.com by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) id <0Q4H00F00F5G4K00@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:57:02 -0800 (PST) X-Va-A: X-Va-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-Va-E-CD: dc7a0cfcd4f3e0cf0efdc41aa53bc138 X-Va-R-CD: 840b521611be5ac4c5a7cb6f20566d03 X-Va-CD: 0 X-Va-ID: 328199fb-b8e8-44ff-9071-1a7b2b487bb2 X-V-A: X-V-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-V-E-CD: dc7a0cfcd4f3e0cf0efdc41aa53bc138 X-V-R-CD: 840b521611be5ac4c5a7cb6f20566d03 X-V-CD: 0 X-V-ID: 46174355-2834-4141-b72e-199f926c2792 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2020-01-17_05:,, signatures=0 Received: from localhost ([17.192.155.241]) by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPSA id <0Q4H00DSQHB0DC30@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:57:00 -0800 (PST) Sender: cpaasch@apple.com From: Christoph Paasch To: netdev@vger.kernel.org Date: Tue, 21 Jan 2020 16:56:27 -0800 Message-id: <20200122005633.21229-14-cpaasch@apple.com> X-Mailer: git-send-email 2.23.0 In-reply-to: <20200122005633.21229-1-cpaasch@apple.com> References: <20200122005633.21229-1-cpaasch@apple.com> MIME-version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2020-01-17_05:, , signatures=0 Message-ID-Hash: 4CECMHVFJWE7M4IQO3ZSSEOTRCFJAL55 X-Message-ID-Hash: 4CECMHVFJWE7M4IQO3ZSSEOTRCFJAL55 X-MailFrom: cpaasch@apple.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: mptcp@lists.01.org X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH net-next v3 13/19] mptcp: allow collapsing consecutive sendpages on the same substream List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: From: Paolo Abeni If the current sendmsg() lands on the same subflow we used last, we can try to collapse the data. Signed-off-by: Paolo Abeni Signed-off-by: Christoph Paasch --- net/mptcp/protocol.c | 75 +++++++++++++++++++++++++++++++++++--------- 1 file changed, 60 insertions(+), 15 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index ad9c73cc20e1..0b21ae25bd0f 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -122,14 +122,27 @@ static struct sock *mptcp_subflow_recv_lookup(const struct mptcp_sock *msk) return NULL; } +static inline bool mptcp_skb_can_collapse_to(const struct mptcp_sock *msk, + const struct sk_buff *skb, + const struct mptcp_ext *mpext) +{ + if (!tcp_skb_can_collapse_to(skb)) + return false; + + /* can collapse only if MPTCP level sequence is in order */ + return mpext && mpext->data_seq + mpext->data_len == msk->write_seq; +} + static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk, - struct msghdr *msg, long *timeo) + struct msghdr *msg, long *timeo, int *pmss_now, + int *ps_goal) { - int mss_now = 0, size_goal = 0, ret = 0; + int mss_now, avail_size, size_goal, ret; struct mptcp_sock *msk = mptcp_sk(sk); struct mptcp_ext *mpext = NULL; + struct sk_buff *skb, *tail; + bool can_collapse = false; struct page_frag *pfrag; - struct sk_buff *skb; size_t psize; /* use the mptcp page cache so that we can easily move the data @@ -145,8 +158,29 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk, /* compute copy limit */ mss_now = tcp_send_mss(ssk, &size_goal, msg->msg_flags); - psize = min_t(int, pfrag->size - pfrag->offset, size_goal); + *pmss_now = mss_now; + *ps_goal = size_goal; + avail_size = size_goal; + skb = tcp_write_queue_tail(ssk); + if (skb) { + mpext = skb_ext_find(skb, SKB_EXT_MPTCP); + + /* Limit the write to the size available in the + * current skb, if any, so that we create at most a new skb. + * Explicitly tells TCP internals to avoid collapsing on later + * queue management operation, to avoid breaking the ext <-> + * SSN association set here + */ + can_collapse = (size_goal - skb->len > 0) && + mptcp_skb_can_collapse_to(msk, skb, mpext); + if (!can_collapse) + TCP_SKB_CB(skb)->eor = 1; + else + avail_size = size_goal - skb->len; + } + psize = min_t(size_t, pfrag->size - pfrag->offset, avail_size); + /* Copy to page */ pr_debug("left=%zu", msg_data_left(msg)); psize = copy_page_from_iter(pfrag->page, pfrag->offset, min_t(size_t, msg_data_left(msg), psize), @@ -155,14 +189,9 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk, if (!psize) return -EINVAL; - /* Mark the end of the previous write so the beginning of the - * next write (with its own mptcp skb extension data) is not - * collapsed. + /* tell the TCP stack to delay the push so that we can safely + * access the skb after the sendpages call */ - skb = tcp_write_queue_tail(ssk); - if (skb) - TCP_SKB_CB(skb)->eor = 1; - ret = do_tcp_sendpages(ssk, pfrag->page, pfrag->offset, psize, msg->msg_flags | MSG_SENDPAGE_NOTLAST); if (ret <= 0) @@ -170,6 +199,18 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk, if (unlikely(ret < psize)) iov_iter_revert(&msg->msg_iter, psize - ret); + /* if the tail skb extension is still the cached one, collapsing + * really happened. Note: we can't check for 'same skb' as the sk_buff + * hdr on tail can be transmitted, freed and re-allocated by the + * do_tcp_sendpages() call + */ + tail = tcp_write_queue_tail(ssk); + if (mpext && tail && mpext == skb_ext_find(tail, SKB_EXT_MPTCP)) { + WARN_ON_ONCE(!can_collapse); + mpext->data_len += ret; + goto out; + } + skb = tcp_write_queue_tail(ssk); mpext = __skb_ext_set(skb, SKB_EXT_MPTCP, msk->cached_ext); msk->cached_ext = NULL; @@ -185,11 +226,11 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk, mpext->data_seq, mpext->subflow_seq, mpext->data_len, mpext->dsn64); +out: pfrag->offset += ret; msk->write_seq += ret; mptcp_subflow_ctx(ssk)->rel_write_seq += ret; - tcp_push(ssk, msg->msg_flags, mss_now, tcp_sk(ssk)->nonagle, size_goal); return ret; } @@ -212,11 +253,11 @@ static void ssk_check_wmem(struct mptcp_sock *msk, struct sock *ssk) static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) { + int mss_now = 0, size_goal = 0, ret = 0; struct mptcp_sock *msk = mptcp_sk(sk); struct socket *ssock; size_t copied = 0; struct sock *ssk; - int ret = 0; long timeo; if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL)) @@ -243,15 +284,19 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) lock_sock(ssk); while (msg_data_left(msg)) { - ret = mptcp_sendmsg_frag(sk, ssk, msg, &timeo); + ret = mptcp_sendmsg_frag(sk, ssk, msg, &timeo, &mss_now, + &size_goal); if (ret < 0) break; copied += ret; } - if (copied > 0) + if (copied) { ret = copied; + tcp_push(ssk, msg->msg_flags, mss_now, tcp_sk(ssk)->nonagle, + size_goal); + } ssk_check_wmem(msk, ssk); release_sock(ssk); From patchwork Wed Jan 22 00:56:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Paasch X-Patchwork-Id: 1226867 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=198.145.21.10; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=quarantine dis=none) header.from=apple.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=apple.com header.i=@apple.com header.a=rsa-sha256 header.s=20180706 header.b=uG0UpVKm; dkim-atps=neutral Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 482Rpm3lY8z9sRX for ; Wed, 22 Jan 2020 11:58:08 +1100 (AEDT) Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id C88B71007B8E4; Tue, 21 Jan 2020 17:01:24 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=17.151.62.67; helo=nwk-aaemail-lapp02.apple.com; envelope-from=cpaasch@apple.com; receiver= Received: from nwk-aaemail-lapp02.apple.com (nwk-aaemail-lapp02.apple.com [17.151.62.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 512BC1007B8C3 for ; Tue, 21 Jan 2020 17:01:23 -0800 (PST) Received: from pps.filterd (nwk-aaemail-lapp02.apple.com [127.0.0.1]) by nwk-aaemail-lapp02.apple.com (8.16.0.27/8.16.0.27) with SMTP id 00M0v2eK020009; Tue, 21 Jan 2020 16:57:04 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=apple.com; h=sender : from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=20180706; bh=/vHaqNOJUr6x7jbpQRexD3xzVPYtFNp5ZrzV4FKpNbs=; b=uG0UpVKmAeXBjVknsYpISDuhA1tJo2tb8JDWXsi2/p4DjCNdOxbq6ysIiWFxmfJM3zPC 8yH1KUtRT8iIkHDhKZHhVfSiTSF1UVmgu7IyedkGqeYyaRC6Xvp8FnV7bJnxyNqHF0PY GRMAcz+GffzLrfHXTkR7emlC7sr29r507WqowI+ZrnhdO9/IpLWJBIz7KKoaoH3Te+wu rTZMQC8ERW6OHQcigpSJ4frcCoAkNo9MRsl5fBQjZYjxS8Fsdn6rtSiizKhhn823wNqF ggWHg1USankggiSQtwhh9fUtUUj58lfr3sxXXssBDNcOWn94N3OkeT7OV2MdA9ThCaaa Yg== Received: from ma1-mtap-s03.corp.apple.com (ma1-mtap-s03.corp.apple.com [17.40.76.7]) by nwk-aaemail-lapp02.apple.com with ESMTP id 2xkyfq8e4a-7 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Tue, 21 Jan 2020 16:57:04 -0800 Received: from nwk-mmpp-sz13.apple.com (nwk-mmpp-sz13.apple.com [17.128.115.216]) by ma1-mtap-s03.corp.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPS id <0Q4H00064HAZD600@ma1-mtap-s03.corp.apple.com>; Tue, 21 Jan 2020 16:57:03 -0800 (PST) Received: from process_milters-daemon.nwk-mmpp-sz13.apple.com by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) id <0Q4H00F00F5G4K00@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:57:02 -0800 (PST) X-Va-A: X-Va-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-Va-E-CD: d36fbbd85c5e341840c93f4d915021fa X-Va-R-CD: e0a733c2d42bf462324452628d481b4c X-Va-CD: 0 X-Va-ID: d6e559b7-cd00-46c0-b556-0f45292c19a4 X-V-A: X-V-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-V-E-CD: d36fbbd85c5e341840c93f4d915021fa X-V-R-CD: e0a733c2d42bf462324452628d481b4c X-V-CD: 0 X-V-ID: 94ad6a18-3510-47dc-952e-7043d38af389 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2020-01-17_05:,, signatures=0 Received: from localhost ([17.192.155.241]) by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPSA id <0Q4H00DSSHB0DC30@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:57:00 -0800 (PST) Sender: cpaasch@apple.com From: Christoph Paasch To: netdev@vger.kernel.org Date: Tue, 21 Jan 2020 16:56:28 -0800 Message-id: <20200122005633.21229-15-cpaasch@apple.com> X-Mailer: git-send-email 2.23.0 In-reply-to: <20200122005633.21229-1-cpaasch@apple.com> References: <20200122005633.21229-1-cpaasch@apple.com> MIME-version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2020-01-17_05:, , signatures=0 Message-ID-Hash: B77463BBLLR5IHKJ5C3PSSV2PYYRIQJQ X-Message-ID-Hash: B77463BBLLR5IHKJ5C3PSSV2PYYRIQJQ X-MailFrom: cpaasch@apple.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: mptcp@lists.01.org X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH net-next v3 14/19] mptcp: new sysctl to control the activation per NS List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: From: Matthieu Baerts New MPTCP sockets will return -ENOPROTOOPT if MPTCP support is disabled for the current net namespace. We are providing here a way to control access to the feature for those that need to turn it on or off. The value of this new sysctl can be different per namespace. We can then restrict the usage of MPTCP to the selected NS. In case of serious issues with MPTCP, administrators can now easily turn MPTCP off. Co-developed-by: Peter Krystad Signed-off-by: Peter Krystad Signed-off-by: Matthieu Baerts Signed-off-by: Christoph Paasch --- net/mptcp/Makefile | 2 +- net/mptcp/ctrl.c | 130 +++++++++++++++++++++++++++++++++++++++++++ net/mptcp/protocol.c | 16 ++++-- net/mptcp/protocol.h | 3 + 4 files changed, 146 insertions(+), 5 deletions(-) create mode 100644 net/mptcp/ctrl.c diff --git a/net/mptcp/Makefile b/net/mptcp/Makefile index 178ae81d8b66..4e98d9edfd0a 100644 --- a/net/mptcp/Makefile +++ b/net/mptcp/Makefile @@ -1,4 +1,4 @@ # SPDX-License-Identifier: GPL-2.0 obj-$(CONFIG_MPTCP) += mptcp.o -mptcp-y := protocol.o subflow.o options.o token.o crypto.o +mptcp-y := protocol.o subflow.o options.o token.o crypto.o ctrl.o diff --git a/net/mptcp/ctrl.c b/net/mptcp/ctrl.c new file mode 100644 index 000000000000..8e39585d37f3 --- /dev/null +++ b/net/mptcp/ctrl.c @@ -0,0 +1,130 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Multipath TCP + * + * Copyright (c) 2019, Tessares SA. + */ + +#include + +#include +#include + +#include "protocol.h" + +#define MPTCP_SYSCTL_PATH "net/mptcp" + +static int mptcp_pernet_id; +struct mptcp_pernet { + struct ctl_table_header *ctl_table_hdr; + + int mptcp_enabled; +}; + +static struct mptcp_pernet *mptcp_get_pernet(struct net *net) +{ + return net_generic(net, mptcp_pernet_id); +} + +int mptcp_is_enabled(struct net *net) +{ + return mptcp_get_pernet(net)->mptcp_enabled; +} + +static struct ctl_table mptcp_sysctl_table[] = { + { + .procname = "enabled", + .maxlen = sizeof(int), + .mode = 0644, + /* users with CAP_NET_ADMIN or root (not and) can change this + * value, same as other sysctl or the 'net' tree. + */ + .proc_handler = proc_dointvec, + }, + {} +}; + +static void mptcp_pernet_set_defaults(struct mptcp_pernet *pernet) +{ + pernet->mptcp_enabled = 1; +} + +static int mptcp_pernet_new_table(struct net *net, struct mptcp_pernet *pernet) +{ + struct ctl_table_header *hdr; + struct ctl_table *table; + + table = mptcp_sysctl_table; + if (!net_eq(net, &init_net)) { + table = kmemdup(table, sizeof(mptcp_sysctl_table), GFP_KERNEL); + if (!table) + goto err_alloc; + } + + table[0].data = &pernet->mptcp_enabled; + + hdr = register_net_sysctl(net, MPTCP_SYSCTL_PATH, table); + if (!hdr) + goto err_reg; + + pernet->ctl_table_hdr = hdr; + + return 0; + +err_reg: + if (!net_eq(net, &init_net)) + kfree(table); +err_alloc: + return -ENOMEM; +} + +static void mptcp_pernet_del_table(struct mptcp_pernet *pernet) +{ + struct ctl_table *table = pernet->ctl_table_hdr->ctl_table_arg; + + unregister_net_sysctl_table(pernet->ctl_table_hdr); + + kfree(table); +} + +static int __net_init mptcp_net_init(struct net *net) +{ + struct mptcp_pernet *pernet = mptcp_get_pernet(net); + + mptcp_pernet_set_defaults(pernet); + + return mptcp_pernet_new_table(net, pernet); +} + +/* Note: the callback will only be called per extra netns */ +static void __net_exit mptcp_net_exit(struct net *net) +{ + struct mptcp_pernet *pernet = mptcp_get_pernet(net); + + mptcp_pernet_del_table(pernet); +} + +static struct pernet_operations mptcp_pernet_ops = { + .init = mptcp_net_init, + .exit = mptcp_net_exit, + .id = &mptcp_pernet_id, + .size = sizeof(struct mptcp_pernet), +}; + +void __init mptcp_init(void) +{ + mptcp_proto_init(); + + if (register_pernet_subsys(&mptcp_pernet_ops) < 0) + panic("Failed to register MPTCP pernet subsystem.\n"); +} + +#if IS_ENABLED(CONFIG_MPTCP_IPV6) +int __init mptcpv6_init(void) +{ + int err; + + err = mptcp_proto_v6_init(); + + return err; +} +#endif diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 0b21ae25bd0f..45e482864a19 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -522,7 +522,7 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk, } } -static int mptcp_init_sock(struct sock *sk) +static int __mptcp_init_sock(struct sock *sk) { struct mptcp_sock *msk = mptcp_sk(sk); @@ -532,6 +532,14 @@ static int mptcp_init_sock(struct sock *sk) return 0; } +static int mptcp_init_sock(struct sock *sk) +{ + if (!mptcp_is_enabled(sock_net(sk))) + return -ENOPROTOOPT; + + return __mptcp_init_sock(sk); +} + static void mptcp_subflow_shutdown(struct sock *ssk, int how) { lock_sock(ssk); @@ -640,7 +648,7 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err, return NULL; } - mptcp_init_sock(new_mptcp_sock); + __mptcp_init_sock(new_mptcp_sock); msk = mptcp_sk(new_mptcp_sock); msk->remote_key = subflow->remote_key; @@ -1078,7 +1086,7 @@ static struct inet_protosw mptcp_protosw = { .flags = INET_PROTOSW_ICSK, }; -void __init mptcp_init(void) +void mptcp_proto_init(void) { mptcp_prot.h.hashinfo = tcp_prot.h.hashinfo; mptcp_stream_ops = inet_stream_ops; @@ -1116,7 +1124,7 @@ static struct inet_protosw mptcp_v6_protosw = { .flags = INET_PROTOSW_ICSK, }; -int mptcpv6_init(void) +int mptcp_proto_v6_init(void) { int err; diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 59a83eb64d37..a8fad7d78565 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -179,6 +179,9 @@ extern const struct inet_connection_sock_af_ops ipv6_specific; #endif void mptcp_proto_init(void); +#if IS_ENABLED(CONFIG_MPTCP_IPV6) +int mptcp_proto_v6_init(void); +#endif struct mptcp_read_arg { struct msghdr *msg; From patchwork Wed Jan 22 00:56:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Paasch X-Patchwork-Id: 1226861 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=2001:19d0:306:5::1; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=quarantine dis=none) header.from=apple.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=apple.com header.i=@apple.com header.a=rsa-sha256 header.s=20180706 header.b=jsvdibZQ; dkim-atps=neutral Received: from ml01.01.org (ml01.01.org [IPv6:2001:19d0:306:5::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 482Rng3nH2z9sRG for ; Wed, 22 Jan 2020 11:57:11 +1100 (AEDT) Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 5A3621007B8EA; Tue, 21 Jan 2020 17:00:28 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=17.151.62.66; helo=nwk-aaemail-lapp01.apple.com; envelope-from=cpaasch@apple.com; receiver= Received: from nwk-aaemail-lapp01.apple.com (nwk-aaemail-lapp01.apple.com [17.151.62.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 1B19B1007B8E9 for ; Tue, 21 Jan 2020 17:00:25 -0800 (PST) Received: from pps.filterd (nwk-aaemail-lapp01.apple.com [127.0.0.1]) by nwk-aaemail-lapp01.apple.com (8.16.0.27/8.16.0.27) with SMTP id 00M0urWv049855; Tue, 21 Jan 2020 16:57:06 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=apple.com; h=sender : from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=20180706; bh=//uqYrn0ZZK6GC7ivx8zPZo4dhzAkWa3wN819DV4eEo=; b=jsvdibZQ1zMBdjwHUubfRS5EFKdeCX1z5aWSvD/kkypwbBGkALRAKx8ciRBSco2+RzNx iilFmVY4C5tpfjWCaPzUjSD2imvXUjWXi1dHSaFubs/u+oKBNCqJW6gWi1SXlVRH6zhr QWzk6Xv3Eiv2u8rMSFi+U4uV0MgiU+OBVpz0IbIVvYQhU2GgRUpaUaYt9X0GxFWsCmDC NzDcCXGZ7v89zFTeSb0ibFdI6MF/AQXlctkgOA5lNwbNMeAvgL437RwVsHXpVXA+gmma pRYU4xjTf1hEFk8KOzJ6YjCGGXNPXinZy5NZP1k5KjUclcCgS0qDOohfhoMkxz4EB2bK ig== Received: from ma1-mtap-s01.corp.apple.com (ma1-mtap-s01.corp.apple.com [17.40.76.5]) by nwk-aaemail-lapp01.apple.com with ESMTP id 2xm1u51ya3-4 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Tue, 21 Jan 2020 16:57:06 -0800 Received: from nwk-mmpp-sz13.apple.com (nwk-mmpp-sz13.apple.com [17.128.115.216]) by ma1-mtap-s01.corp.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPS id <0Q4H00HWQHAUJ720@ma1-mtap-s01.corp.apple.com>; Tue, 21 Jan 2020 16:57:03 -0800 (PST) Received: from process_milters-daemon.nwk-mmpp-sz13.apple.com by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) id <0Q4H00F00F5G4K00@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:57:02 -0800 (PST) X-Va-A: X-Va-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-Va-E-CD: f5943885b303410f1823751c89c99561 X-Va-R-CD: cafe44edda8a810d3d8ff86ddeda1264 X-Va-CD: 0 X-Va-ID: da9573dc-e582-4ef1-9454-26fa80f41b68 X-V-A: X-V-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-V-E-CD: f5943885b303410f1823751c89c99561 X-V-R-CD: cafe44edda8a810d3d8ff86ddeda1264 X-V-CD: 0 X-V-ID: 04b7d742-6c44-48d5-944f-26468190f201 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2020-01-17_05:,, signatures=0 Received: from localhost ([17.192.155.241]) by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPSA id <0Q4H00DSUHB0DC30@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:57:01 -0800 (PST) Sender: cpaasch@apple.com From: Christoph Paasch To: netdev@vger.kernel.org Date: Tue, 21 Jan 2020 16:56:29 -0800 Message-id: <20200122005633.21229-16-cpaasch@apple.com> X-Mailer: git-send-email 2.23.0 In-reply-to: <20200122005633.21229-1-cpaasch@apple.com> References: <20200122005633.21229-1-cpaasch@apple.com> MIME-version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2020-01-17_05:, , signatures=0 Message-ID-Hash: SC4MIDH4LYSS2VPYSSXX7LJDTQJ2ZYW2 X-Message-ID-Hash: SC4MIDH4LYSS2VPYSSXX7LJDTQJ2ZYW2 X-MailFrom: cpaasch@apple.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: mptcp@lists.01.org X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH net-next v3 15/19] mptcp: add basic kselftest for mptcp List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: From: Florian Westphal Add mptcp_connect tool: xmit two files back and forth between two processes, several net namespaces including some adding delays, losses and reordering. Wrapper script tests that data was transmitted without corruption. The "-c" command line option for mptcp_connect.sh is there for debugging: The script will use tcpdump to create one .pcap file per test case, named according to the namespaces, protocols, and connect address in use. For example, the first test case writes the capture to ns1-ns1-MPTCP-MPTCP-10.0.1.1.pcap. The stderr output from tcpdump is printed after the test completes to show tcpdump's "packets dropped by kernel" information. Also check that userspace can't create MPTCP sockets when mptcp.enabled sysctl is off. The "-b" option allows to tune/lower send buffer size. "-m mmap" can be used to test blocking io. Default is non-blocking io using read/write/poll. Will run automatically on "make kselftest". Note that the default timeout of 45 seconds is used even if there is a "settings" changing it to 450. 45 seconds should be enough in most cases but this depends on the machine running the tests. A fix to correctly read the "settings" file has been proposed upstream but not applied yet. It is not blocking the execution of these new tests but it would be nice to have it: https://patchwork.kernel.org/patch/11204935/ Co-developed-by: Paolo Abeni Signed-off-by: Paolo Abeni Co-developed-by: Mat Martineau Signed-off-by: Mat Martineau Co-developed-by: Matthieu Baerts Signed-off-by: Matthieu Baerts Co-developed-by: Davide Caratti Signed-off-by: Davide Caratti Signed-off-by: Florian Westphal Signed-off-by: Christoph Paasch --- MAINTAINERS | 1 + tools/testing/selftests/Makefile | 1 + tools/testing/selftests/net/mptcp/.gitignore | 2 + tools/testing/selftests/net/mptcp/Makefile | 13 + tools/testing/selftests/net/mptcp/config | 4 + .../selftests/net/mptcp/mptcp_connect.c | 832 ++++++++++++++++++ .../selftests/net/mptcp/mptcp_connect.sh | 595 +++++++++++++ tools/testing/selftests/net/mptcp/settings | 1 + 8 files changed, 1449 insertions(+) create mode 100644 tools/testing/selftests/net/mptcp/.gitignore create mode 100644 tools/testing/selftests/net/mptcp/Makefile create mode 100644 tools/testing/selftests/net/mptcp/config create mode 100644 tools/testing/selftests/net/mptcp/mptcp_connect.c create mode 100755 tools/testing/selftests/net/mptcp/mptcp_connect.sh create mode 100644 tools/testing/selftests/net/mptcp/settings diff --git a/MAINTAINERS b/MAINTAINERS index 4be9c1b50183..e7463b46565b 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -11584,6 +11584,7 @@ B: https://github.com/multipath-tcp/mptcp_net-next/issues S: Maintained F: include/net/mptcp.h F: net/mptcp/ +F: tools/testing/selftests/net/mptcp/ NETWORKING [TCP] M: Eric Dumazet diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index b001c602414b..b45a8f77ee24 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -32,6 +32,7 @@ TARGETS += memory-hotplug TARGETS += mount TARGETS += mqueue TARGETS += net +TARGETS += net/mptcp TARGETS += netfilter TARGETS += networking/timestamping TARGETS += nsfs diff --git a/tools/testing/selftests/net/mptcp/.gitignore b/tools/testing/selftests/net/mptcp/.gitignore new file mode 100644 index 000000000000..d72f07642738 --- /dev/null +++ b/tools/testing/selftests/net/mptcp/.gitignore @@ -0,0 +1,2 @@ +mptcp_connect +*.pcap diff --git a/tools/testing/selftests/net/mptcp/Makefile b/tools/testing/selftests/net/mptcp/Makefile new file mode 100644 index 000000000000..93de52016dde --- /dev/null +++ b/tools/testing/selftests/net/mptcp/Makefile @@ -0,0 +1,13 @@ +# SPDX-License-Identifier: GPL-2.0 + +top_srcdir = ../../../../.. + +CFLAGS = -Wall -Wl,--no-as-needed -O2 -g + +TEST_PROGS := mptcp_connect.sh + +TEST_GEN_FILES = mptcp_connect + +EXTRA_CLEAN := *.pcap + +include ../../lib.mk diff --git a/tools/testing/selftests/net/mptcp/config b/tools/testing/selftests/net/mptcp/config new file mode 100644 index 000000000000..2499824d9e1c --- /dev/null +++ b/tools/testing/selftests/net/mptcp/config @@ -0,0 +1,4 @@ +CONFIG_MPTCP=y +CONFIG_MPTCP_IPV6=y +CONFIG_VETH=y +CONFIG_NET_SCH_NETEM=m diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.c b/tools/testing/selftests/net/mptcp/mptcp_connect.c new file mode 100644 index 000000000000..a3dccd816ae4 --- /dev/null +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c @@ -0,0 +1,832 @@ +// SPDX-License-Identifier: GPL-2.0 + +#define _GNU_SOURCE + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include + +#include +#include + +#include + +extern int optind; + +#ifndef IPPROTO_MPTCP +#define IPPROTO_MPTCP 262 +#endif +#ifndef TCP_ULP +#define TCP_ULP 31 +#endif + +static bool listen_mode; +static int poll_timeout; + +enum cfg_mode { + CFG_MODE_POLL, + CFG_MODE_MMAP, + CFG_MODE_SENDFILE, +}; + +static enum cfg_mode cfg_mode = CFG_MODE_POLL; +static const char *cfg_host; +static const char *cfg_port = "12000"; +static int cfg_sock_proto = IPPROTO_MPTCP; +static bool tcpulp_audit; +static int pf = AF_INET; +static int cfg_sndbuf; + +static void die_usage(void) +{ + fprintf(stderr, "Usage: mptcp_connect [-6] [-u] [-s MPTCP|TCP] [-p port] -m mode]" + "[ -l ] [ -t timeout ] connect_address\n"); + exit(1); +} + +static const char *getxinfo_strerr(int err) +{ + if (err == EAI_SYSTEM) + return strerror(errno); + + return gai_strerror(err); +} + +static void xgetnameinfo(const struct sockaddr *addr, socklen_t addrlen, + char *host, socklen_t hostlen, + char *serv, socklen_t servlen) +{ + int flags = NI_NUMERICHOST | NI_NUMERICSERV; + int err = getnameinfo(addr, addrlen, host, hostlen, serv, servlen, + flags); + + if (err) { + const char *errstr = getxinfo_strerr(err); + + fprintf(stderr, "Fatal: getnameinfo: %s\n", errstr); + exit(1); + } +} + +static void xgetaddrinfo(const char *node, const char *service, + const struct addrinfo *hints, + struct addrinfo **res) +{ + int err = getaddrinfo(node, service, hints, res); + + if (err) { + const char *errstr = getxinfo_strerr(err); + + fprintf(stderr, "Fatal: getaddrinfo(%s:%s): %s\n", + node ? node : "", service ? service : "", errstr); + exit(1); + } +} + +static void set_sndbuf(int fd, unsigned int size) +{ + int err; + + err = setsockopt(fd, SOL_SOCKET, SO_SNDBUF, &size, sizeof(size)); + if (err) { + perror("set SO_SNDBUF"); + exit(1); + } +} + +static int sock_listen_mptcp(const char * const listenaddr, + const char * const port) +{ + int sock; + struct addrinfo hints = { + .ai_protocol = IPPROTO_TCP, + .ai_socktype = SOCK_STREAM, + .ai_flags = AI_PASSIVE | AI_NUMERICHOST + }; + + hints.ai_family = pf; + + struct addrinfo *a, *addr; + int one = 1; + + xgetaddrinfo(listenaddr, port, &hints, &addr); + hints.ai_family = pf; + + for (a = addr; a; a = a->ai_next) { + sock = socket(a->ai_family, a->ai_socktype, cfg_sock_proto); + if (sock < 0) + continue; + + if (-1 == setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &one, + sizeof(one))) + perror("setsockopt"); + + if (bind(sock, a->ai_addr, a->ai_addrlen) == 0) + break; /* success */ + + perror("bind"); + close(sock); + sock = -1; + } + + freeaddrinfo(addr); + + if (sock < 0) { + fprintf(stderr, "Could not create listen socket\n"); + return sock; + } + + if (listen(sock, 20)) { + perror("listen"); + close(sock); + return -1; + } + + return sock; +} + +static bool sock_test_tcpulp(const char * const remoteaddr, + const char * const port) +{ + struct addrinfo hints = { + .ai_protocol = IPPROTO_TCP, + .ai_socktype = SOCK_STREAM, + }; + struct addrinfo *a, *addr; + int sock = -1, ret = 0; + bool test_pass = false; + + hints.ai_family = AF_INET; + + xgetaddrinfo(remoteaddr, port, &hints, &addr); + for (a = addr; a; a = a->ai_next) { + sock = socket(a->ai_family, a->ai_socktype, IPPROTO_TCP); + if (sock < 0) { + perror("socket"); + continue; + } + ret = setsockopt(sock, IPPROTO_TCP, TCP_ULP, "mptcp", + sizeof("mptcp")); + if (ret == -1 && errno == EOPNOTSUPP) + test_pass = true; + close(sock); + + if (test_pass) + break; + if (!ret) + fprintf(stderr, + "setsockopt(TCP_ULP) returned 0\n"); + else + perror("setsockopt(TCP_ULP)"); + } + return test_pass; +} + +static int sock_connect_mptcp(const char * const remoteaddr, + const char * const port, int proto) +{ + struct addrinfo hints = { + .ai_protocol = IPPROTO_TCP, + .ai_socktype = SOCK_STREAM, + }; + struct addrinfo *a, *addr; + int sock = -1; + + hints.ai_family = pf; + + xgetaddrinfo(remoteaddr, port, &hints, &addr); + for (a = addr; a; a = a->ai_next) { + sock = socket(a->ai_family, a->ai_socktype, proto); + if (sock < 0) { + perror("socket"); + continue; + } + + if (connect(sock, a->ai_addr, a->ai_addrlen) == 0) + break; /* success */ + + perror("connect()"); + close(sock); + sock = -1; + } + + freeaddrinfo(addr); + return sock; +} + +static size_t do_rnd_write(const int fd, char *buf, const size_t len) +{ + unsigned int do_w; + ssize_t bw; + + do_w = rand() & 0xffff; + if (do_w == 0 || do_w > len) + do_w = len; + + bw = write(fd, buf, do_w); + if (bw < 0) + perror("write"); + + return bw; +} + +static size_t do_write(const int fd, char *buf, const size_t len) +{ + size_t offset = 0; + + while (offset < len) { + size_t written; + ssize_t bw; + + bw = write(fd, buf + offset, len - offset); + if (bw < 0) { + perror("write"); + return 0; + } + + written = (size_t)bw; + offset += written; + } + + return offset; +} + +static ssize_t do_rnd_read(const int fd, char *buf, const size_t len) +{ + size_t cap = rand(); + + cap &= 0xffff; + + if (cap == 0) + cap = 1; + else if (cap > len) + cap = len; + + return read(fd, buf, cap); +} + +static void set_nonblock(int fd) +{ + int flags = fcntl(fd, F_GETFL); + + if (flags == -1) + return; + + fcntl(fd, F_SETFL, flags | O_NONBLOCK); +} + +static int copyfd_io_poll(int infd, int peerfd, int outfd) +{ + struct pollfd fds = { + .fd = peerfd, + .events = POLLIN | POLLOUT, + }; + unsigned int woff = 0, wlen = 0; + char wbuf[8192]; + + set_nonblock(peerfd); + + for (;;) { + char rbuf[8192]; + ssize_t len; + + if (fds.events == 0) + break; + + switch (poll(&fds, 1, poll_timeout)) { + case -1: + if (errno == EINTR) + continue; + perror("poll"); + return 1; + case 0: + fprintf(stderr, "%s: poll timed out (events: " + "POLLIN %u, POLLOUT %u)\n", __func__, + fds.events & POLLIN, fds.events & POLLOUT); + return 2; + } + + if (fds.revents & POLLIN) { + len = do_rnd_read(peerfd, rbuf, sizeof(rbuf)); + if (len == 0) { + /* no more data to receive: + * peer has closed its write side + */ + fds.events &= ~POLLIN; + + if ((fds.events & POLLOUT) == 0) + /* and nothing more to send */ + break; + + /* Else, still have data to transmit */ + } else if (len < 0) { + perror("read"); + return 3; + } + + do_write(outfd, rbuf, len); + } + + if (fds.revents & POLLOUT) { + if (wlen == 0) { + woff = 0; + wlen = read(infd, wbuf, sizeof(wbuf)); + } + + if (wlen > 0) { + ssize_t bw; + + bw = do_rnd_write(peerfd, wbuf + woff, wlen); + if (bw < 0) + return 111; + + woff += bw; + wlen -= bw; + } else if (wlen == 0) { + /* We have no more data to send. */ + fds.events &= ~POLLOUT; + + if ((fds.events & POLLIN) == 0) + /* ... and peer also closed already */ + break; + + /* ... but we still receive. + * Close our write side. + */ + shutdown(peerfd, SHUT_WR); + } else { + if (errno == EINTR) + continue; + perror("read"); + return 4; + } + } + + if (fds.revents & (POLLERR | POLLNVAL)) { + fprintf(stderr, "Unexpected revents: " + "POLLERR/POLLNVAL(%x)\n", fds.revents); + return 5; + } + } + + close(peerfd); + return 0; +} + +static int do_recvfile(int infd, int outfd) +{ + ssize_t r; + + do { + char buf[16384]; + + r = do_rnd_read(infd, buf, sizeof(buf)); + if (r > 0) { + if (write(outfd, buf, r) != r) + break; + } else if (r < 0) { + perror("read"); + } + } while (r > 0); + + return (int)r; +} + +static int do_mmap(int infd, int outfd, unsigned int size) +{ + char *inbuf = mmap(NULL, size, PROT_READ, MAP_SHARED, infd, 0); + ssize_t ret = 0, off = 0; + size_t rem; + + if (inbuf == MAP_FAILED) { + perror("mmap"); + return 1; + } + + rem = size; + + while (rem > 0) { + ret = write(outfd, inbuf + off, rem); + + if (ret < 0) { + perror("write"); + break; + } + + off += ret; + rem -= ret; + } + + munmap(inbuf, size); + return rem; +} + +static int get_infd_size(int fd) +{ + struct stat sb; + ssize_t count; + int err; + + err = fstat(fd, &sb); + if (err < 0) { + perror("fstat"); + return -1; + } + + if ((sb.st_mode & S_IFMT) != S_IFREG) { + fprintf(stderr, "%s: stdin is not a regular file\n", __func__); + return -2; + } + + count = sb.st_size; + if (count > INT_MAX) { + fprintf(stderr, "File too large: %zu\n", count); + return -3; + } + + return (int)count; +} + +static int do_sendfile(int infd, int outfd, unsigned int count) +{ + while (count > 0) { + ssize_t r; + + r = sendfile(outfd, infd, NULL, count); + if (r < 0) { + perror("sendfile"); + return 3; + } + + count -= r; + } + + return 0; +} + +static int copyfd_io_mmap(int infd, int peerfd, int outfd, + unsigned int size) +{ + int err; + + if (listen_mode) { + err = do_recvfile(peerfd, outfd); + if (err) + return err; + + err = do_mmap(infd, peerfd, size); + } else { + err = do_mmap(infd, peerfd, size); + if (err) + return err; + + shutdown(peerfd, SHUT_WR); + + err = do_recvfile(peerfd, outfd); + } + + return err; +} + +static int copyfd_io_sendfile(int infd, int peerfd, int outfd, + unsigned int size) +{ + int err; + + if (listen_mode) { + err = do_recvfile(peerfd, outfd); + if (err) + return err; + + err = do_sendfile(infd, peerfd, size); + } else { + err = do_sendfile(infd, peerfd, size); + if (err) + return err; + err = do_recvfile(peerfd, outfd); + } + + return err; +} + +static int copyfd_io(int infd, int peerfd, int outfd) +{ + int file_size; + + switch (cfg_mode) { + case CFG_MODE_POLL: + return copyfd_io_poll(infd, peerfd, outfd); + case CFG_MODE_MMAP: + file_size = get_infd_size(infd); + if (file_size < 0) + return file_size; + return copyfd_io_mmap(infd, peerfd, outfd, file_size); + case CFG_MODE_SENDFILE: + file_size = get_infd_size(infd); + if (file_size < 0) + return file_size; + return copyfd_io_sendfile(infd, peerfd, outfd, file_size); + } + + fprintf(stderr, "Invalid mode %d\n", cfg_mode); + + die_usage(); + return 1; +} + +static void check_sockaddr(int pf, struct sockaddr_storage *ss, + socklen_t salen) +{ + struct sockaddr_in6 *sin6; + struct sockaddr_in *sin; + socklen_t wanted_size = 0; + + switch (pf) { + case AF_INET: + wanted_size = sizeof(*sin); + sin = (void *)ss; + if (!sin->sin_port) + fprintf(stderr, "accept: something wrong: ip connection from port 0"); + break; + case AF_INET6: + wanted_size = sizeof(*sin6); + sin6 = (void *)ss; + if (!sin6->sin6_port) + fprintf(stderr, "accept: something wrong: ipv6 connection from port 0"); + break; + default: + fprintf(stderr, "accept: Unknown pf %d, salen %u\n", pf, salen); + return; + } + + if (salen != wanted_size) + fprintf(stderr, "accept: size mismatch, got %d expected %d\n", + (int)salen, wanted_size); + + if (ss->ss_family != pf) + fprintf(stderr, "accept: pf mismatch, expect %d, ss_family is %d\n", + (int)ss->ss_family, pf); +} + +static void check_getpeername(int fd, struct sockaddr_storage *ss, socklen_t salen) +{ + struct sockaddr_storage peerss; + socklen_t peersalen = sizeof(peerss); + + if (getpeername(fd, (struct sockaddr *)&peerss, &peersalen) < 0) { + perror("getpeername"); + return; + } + + if (peersalen != salen) { + fprintf(stderr, "%s: %d vs %d\n", __func__, peersalen, salen); + return; + } + + if (memcmp(ss, &peerss, peersalen)) { + char a[INET6_ADDRSTRLEN]; + char b[INET6_ADDRSTRLEN]; + char c[INET6_ADDRSTRLEN]; + char d[INET6_ADDRSTRLEN]; + + xgetnameinfo((struct sockaddr *)ss, salen, + a, sizeof(a), b, sizeof(b)); + + xgetnameinfo((struct sockaddr *)&peerss, peersalen, + c, sizeof(c), d, sizeof(d)); + + fprintf(stderr, "%s: memcmp failure: accept %s vs peername %s, %s vs %s salen %d vs %d\n", + __func__, a, c, b, d, peersalen, salen); + } +} + +static void check_getpeername_connect(int fd) +{ + struct sockaddr_storage ss; + socklen_t salen = sizeof(ss); + char a[INET6_ADDRSTRLEN]; + char b[INET6_ADDRSTRLEN]; + + if (getpeername(fd, (struct sockaddr *)&ss, &salen) < 0) { + perror("getpeername"); + return; + } + + xgetnameinfo((struct sockaddr *)&ss, salen, + a, sizeof(a), b, sizeof(b)); + + if (strcmp(cfg_host, a) || strcmp(cfg_port, b)) + fprintf(stderr, "%s: %s vs %s, %s vs %s\n", __func__, + cfg_host, a, cfg_port, b); +} + +int main_loop_s(int listensock) +{ + struct sockaddr_storage ss; + struct pollfd polls; + socklen_t salen; + int remotesock; + + polls.fd = listensock; + polls.events = POLLIN; + + switch (poll(&polls, 1, poll_timeout)) { + case -1: + perror("poll"); + return 1; + case 0: + fprintf(stderr, "%s: timed out\n", __func__); + close(listensock); + return 2; + } + + salen = sizeof(ss); + remotesock = accept(listensock, (struct sockaddr *)&ss, &salen); + if (remotesock >= 0) { + check_sockaddr(pf, &ss, salen); + check_getpeername(remotesock, &ss, salen); + + return copyfd_io(0, remotesock, 1); + } + + perror("accept"); + + return 1; +} + +static void init_rng(void) +{ + int fd = open("/dev/urandom", O_RDONLY); + unsigned int foo; + + if (fd > 0) { + int ret = read(fd, &foo, sizeof(foo)); + + if (ret < 0) + srand(fd + foo); + close(fd); + } + + srand(foo); +} + +int main_loop(void) +{ + int fd; + + /* listener is ready. */ + fd = sock_connect_mptcp(cfg_host, cfg_port, cfg_sock_proto); + if (fd < 0) + return 2; + + check_getpeername_connect(fd); + + if (cfg_sndbuf) + set_sndbuf(fd, cfg_sndbuf); + + return copyfd_io(0, fd, 1); +} + +int parse_proto(const char *proto) +{ + if (!strcasecmp(proto, "MPTCP")) + return IPPROTO_MPTCP; + if (!strcasecmp(proto, "TCP")) + return IPPROTO_TCP; + + fprintf(stderr, "Unknown protocol: %s\n.", proto); + die_usage(); + + /* silence compiler warning */ + return 0; +} + +int parse_mode(const char *mode) +{ + if (!strcasecmp(mode, "poll")) + return CFG_MODE_POLL; + if (!strcasecmp(mode, "mmap")) + return CFG_MODE_MMAP; + if (!strcasecmp(mode, "sendfile")) + return CFG_MODE_SENDFILE; + + fprintf(stderr, "Unknown test mode: %s\n", mode); + fprintf(stderr, "Supported modes are:\n"); + fprintf(stderr, "\t\t\"poll\" - interleaved read/write using poll()\n"); + fprintf(stderr, "\t\t\"mmap\" - send entire input file (mmap+write), then read response (-l will read input first)\n"); + fprintf(stderr, "\t\t\"sendfile\" - send entire input file (sendfile), then read response (-l will read input first)\n"); + + die_usage(); + + /* silence compiler warning */ + return 0; +} + +int parse_sndbuf(const char *size) +{ + unsigned long s; + + errno = 0; + + s = strtoul(size, NULL, 0); + + if (errno) { + fprintf(stderr, "Invalid sndbuf size %s (%s)\n", + size, strerror(errno)); + die_usage(); + } + + if (s > INT_MAX) { + fprintf(stderr, "Invalid sndbuf size %s (%s)\n", + size, strerror(ERANGE)); + die_usage(); + } + + cfg_sndbuf = s; + + return 0; +} + +static void parse_opts(int argc, char **argv) +{ + int c; + + while ((c = getopt(argc, argv, "6lp:s:hut:m:b:")) != -1) { + switch (c) { + case 'l': + listen_mode = true; + break; + case 'p': + cfg_port = optarg; + break; + case 's': + cfg_sock_proto = parse_proto(optarg); + break; + case 'h': + die_usage(); + break; + case 'u': + tcpulp_audit = true; + break; + case '6': + pf = AF_INET6; + break; + case 't': + poll_timeout = atoi(optarg) * 1000; + if (poll_timeout <= 0) + poll_timeout = -1; + break; + case 'm': + cfg_mode = parse_mode(optarg); + break; + case 'b': + cfg_sndbuf = parse_sndbuf(optarg); + break; + } + } + + if (optind + 1 != argc) + die_usage(); + cfg_host = argv[optind]; + + if (strchr(cfg_host, ':')) + pf = AF_INET6; +} + +int main(int argc, char *argv[]) +{ + init_rng(); + + parse_opts(argc, argv); + + if (tcpulp_audit) + return sock_test_tcpulp(cfg_host, cfg_port) ? 0 : 1; + + if (listen_mode) { + int fd = sock_listen_mptcp(cfg_host, cfg_port); + + if (fd < 0) + return 1; + + if (cfg_sndbuf) + set_sndbuf(fd, cfg_sndbuf); + + return main_loop_s(fd); + } + + return main_loop(); +} diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh b/tools/testing/selftests/net/mptcp/mptcp_connect.sh new file mode 100755 index 000000000000..d573a0feb98d --- /dev/null +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh @@ -0,0 +1,595 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +time_start=$(date +%s) + +optstring="b:d:e:l:r:h4cm:" +ret=0 +sin="" +sout="" +cin="" +cout="" +ksft_skip=4 +capture=false +timeout=30 +ipv6=true +ethtool_random_on=true +tc_delay="$((RANDOM%400))" +tc_loss=$((RANDOM%101)) +tc_reorder="" +testmode="" +sndbuf=0 +options_log=true + +if [ $tc_loss -eq 100 ];then + tc_loss=1% +elif [ $tc_loss -ge 10 ]; then + tc_loss=0.$tc_loss% +elif [ $tc_loss -ge 1 ]; then + tc_loss=0.0$tc_loss% +else + tc_loss="" +fi + +usage() { + echo "Usage: $0 [ -a ]" + echo -e "\t-d: tc/netem delay in milliseconds, e.g. \"-d 10\" (default random)" + echo -e "\t-l: tc/netem loss percentage, e.g. \"-l 0.02\" (default random)" + echo -e "\t-r: tc/netem reorder mode, e.g. \"-r 25% 50% gap 5\", use "-r 0" to disable reordering (default random)" + echo -e "\t-e: ethtool features to disable, e.g.: \"-e tso -e gso\" (default: randomly disable any of tso/gso/gro)" + echo -e "\t-4: IPv4 only: disable IPv6 tests (default: test both IPv4 and IPv6)" + echo -e "\t-c: capture packets for each test using tcpdump (default: no capture)" + echo -e "\t-b: set sndbuf value (default: use kernel default)" + echo -e "\t-m: test mode (poll, sendfile; default: poll)" +} + +while getopts "$optstring" option;do + case "$option" in + "h") + usage $0 + exit 0 + ;; + "d") + if [ $OPTARG -ge 0 ];then + tc_delay="$OPTARG" + else + echo "-d requires numeric argument, got \"$OPTARG\"" 1>&2 + exit 1 + fi + ;; + "e") + ethtool_args="$ethtool_args $OPTARG off" + ethtool_random_on=false + ;; + "l") + tc_loss="$OPTARG" + ;; + "r") + tc_reorder="$OPTARG" + ;; + "4") + ipv6=false + ;; + "c") + capture=true + ;; + "b") + if [ $OPTARG -ge 0 ];then + sndbuf="$OPTARG" + else + echo "-s requires numeric argument, got \"$OPTARG\"" 1>&2 + exit 1 + fi + ;; + "m") + testmode="$OPTARG" + ;; + "?") + usage $0 + exit 1 + ;; + esac +done + +sec=$(date +%s) +rndh=$(printf %x $sec)-$(mktemp -u XXXXXX) +ns1="ns1-$rndh" +ns2="ns2-$rndh" +ns3="ns3-$rndh" +ns4="ns4-$rndh" + +TEST_COUNT=0 + +cleanup() +{ + rm -f "$cin" "$cout" + rm -f "$sin" "$sout" + rm -f "$capout" + + local netns + for netns in "$ns1" "$ns2" "$ns3" "$ns4";do + ip netns del $netns + done +} + +ip -Version > /dev/null 2>&1 +if [ $? -ne 0 ];then + echo "SKIP: Could not run test without ip tool" + exit $ksft_skip +fi + +sin=$(mktemp) +sout=$(mktemp) +cin=$(mktemp) +cout=$(mktemp) +capout=$(mktemp) +trap cleanup EXIT + +for i in "$ns1" "$ns2" "$ns3" "$ns4";do + ip netns add $i || exit $ksft_skip + ip -net $i link set lo up +done + +# "$ns1" ns2 ns3 ns4 +# ns1eth2 ns2eth1 ns2eth3 ns3eth2 ns3eth4 ns4eth3 +# - drop 1% -> reorder 25% +# <- TSO off - + +ip link add ns1eth2 netns "$ns1" type veth peer name ns2eth1 netns "$ns2" +ip link add ns2eth3 netns "$ns2" type veth peer name ns3eth2 netns "$ns3" +ip link add ns3eth4 netns "$ns3" type veth peer name ns4eth3 netns "$ns4" + +ip -net "$ns1" addr add 10.0.1.1/24 dev ns1eth2 +ip -net "$ns1" addr add dead:beef:1::1/64 dev ns1eth2 nodad + +ip -net "$ns1" link set ns1eth2 up +ip -net "$ns1" route add default via 10.0.1.2 +ip -net "$ns1" route add default via dead:beef:1::2 + +ip -net "$ns2" addr add 10.0.1.2/24 dev ns2eth1 +ip -net "$ns2" addr add dead:beef:1::2/64 dev ns2eth1 nodad +ip -net "$ns2" link set ns2eth1 up + +ip -net "$ns2" addr add 10.0.2.1/24 dev ns2eth3 +ip -net "$ns2" addr add dead:beef:2::1/64 dev ns2eth3 nodad +ip -net "$ns2" link set ns2eth3 up +ip -net "$ns2" route add default via 10.0.2.2 +ip -net "$ns2" route add default via dead:beef:2::2 +ip netns exec "$ns2" sysctl -q net.ipv4.ip_forward=1 +ip netns exec "$ns2" sysctl -q net.ipv6.conf.all.forwarding=1 + +ip -net "$ns3" addr add 10.0.2.2/24 dev ns3eth2 +ip -net "$ns3" addr add dead:beef:2::2/64 dev ns3eth2 nodad +ip -net "$ns3" link set ns3eth2 up + +ip -net "$ns3" addr add 10.0.3.2/24 dev ns3eth4 +ip -net "$ns3" addr add dead:beef:3::2/64 dev ns3eth4 nodad +ip -net "$ns3" link set ns3eth4 up +ip -net "$ns3" route add default via 10.0.2.1 +ip -net "$ns3" route add default via dead:beef:2::1 +ip netns exec "$ns3" sysctl -q net.ipv4.ip_forward=1 +ip netns exec "$ns3" sysctl -q net.ipv6.conf.all.forwarding=1 + +ip -net "$ns4" addr add 10.0.3.1/24 dev ns4eth3 +ip -net "$ns4" addr add dead:beef:3::1/64 dev ns4eth3 nodad +ip -net "$ns4" link set ns4eth3 up +ip -net "$ns4" route add default via 10.0.3.2 +ip -net "$ns4" route add default via dead:beef:3::2 + +set_ethtool_flags() { + local ns="$1" + local dev="$2" + local flags="$3" + + ip netns exec $ns ethtool -K $dev $flags 2>/dev/null + [ $? -eq 0 ] && echo "INFO: set $ns dev $dev: ethtool -K $flags" +} + +set_random_ethtool_flags() { + local flags="" + local r=$RANDOM + + local pick1=$((r & 1)) + local pick2=$((r & 2)) + local pick3=$((r & 4)) + + [ $pick1 -ne 0 ] && flags="tso off" + [ $pick2 -ne 0 ] && flags="$flags gso off" + [ $pick3 -ne 0 ] && flags="$flags gro off" + + [ -z "$flags" ] && return + + set_ethtool_flags "$1" "$2" "$flags" +} + +if $ethtool_random_on;then + set_random_ethtool_flags "$ns3" ns3eth2 + set_random_ethtool_flags "$ns4" ns4eth3 +else + set_ethtool_flags "$ns3" ns3eth2 "$ethtool_args" + set_ethtool_flags "$ns4" ns4eth3 "$ethtool_args" +fi + +print_file_err() +{ + ls -l "$1" 1>&2 + echo "Trailing bytes are: " + tail -c 27 "$1" +} + +check_transfer() +{ + local in=$1 + local out=$2 + local what=$3 + + cmp "$in" "$out" > /dev/null 2>&1 + if [ $? -ne 0 ] ;then + echo "[ FAIL ] $what does not match (in, out):" + print_file_err "$in" + print_file_err "$out" + + return 1 + fi + + return 0 +} + +check_mptcp_disabled() +{ + local disabled_ns + disabled_ns="ns_disabled-$sech-$(mktemp -u XXXXXX)" + ip netns add ${disabled_ns} || exit $ksft_skip + + # net.mptcp.enabled should be enabled by default + if [ "$(ip netns exec ${disabled_ns} sysctl net.mptcp.enabled | awk '{ print $3 }')" -ne 1 ]; then + echo -e "net.mptcp.enabled sysctl is not 1 by default\t\t[ FAIL ]" + ret=1 + return 1 + fi + ip netns exec ${disabled_ns} sysctl -q net.mptcp.enabled=0 + + local err=0 + LANG=C ip netns exec ${disabled_ns} ./mptcp_connect -t $timeout -p 10000 -s MPTCP 127.0.0.1 < "$cin" 2>&1 | \ + grep -q "^socket: Protocol not available$" && err=1 + ip netns delete ${disabled_ns} + + if [ ${err} -eq 0 ]; then + echo -e "New MPTCP socket cannot be blocked via sysctl\t\t[ FAIL ]" + ret=1 + return 1 + fi + + echo -e "New MPTCP socket can be blocked via sysctl\t\t[ OK ]" + return 0 +} + +check_mptcp_ulp_setsockopt() +{ + local t retval + t="ns_ulp-$sech-$(mktemp -u XXXXXX)" + + ip netns add ${t} || exit $ksft_skip + if ! ip netns exec ${t} ./mptcp_connect -u -p 10000 -s TCP 127.0.0.1 2>&1; then + printf "setsockopt(..., TCP_ULP, \"mptcp\", ...) allowed\t[ FAIL ]\n" + retval=1 + ret=$retval + else + printf "setsockopt(..., TCP_ULP, \"mptcp\", ...) blocked\t[ OK ]\n" + retval=0 + fi + ip netns del ${t} + return $retval +} + +# $1: IP address +is_v6() +{ + [ -z "${1##*:*}" ] +} + +do_ping() +{ + local listener_ns="$1" + local connector_ns="$2" + local connect_addr="$3" + local ping_args="-q -c 1" + + if is_v6 "${connect_addr}"; then + $ipv6 || return 0 + ping_args="${ping_args} -6" + fi + + ip netns exec ${connector_ns} ping ${ping_args} $connect_addr >/dev/null + if [ $? -ne 0 ] ; then + echo "$listener_ns -> $connect_addr connectivity [ FAIL ]" 1>&2 + ret=1 + + return 1 + fi + + return 0 +} + +# $1: ns, $2: port +wait_local_port_listen() +{ + local listener_ns="${1}" + local port="${2}" + + local port_hex i + + port_hex="$(printf "%04X" "${port}")" + for i in $(seq 10); do + ip netns exec "${listener_ns}" cat /proc/net/tcp* | \ + awk "BEGIN {rc=1} {if (\$2 ~ /:${port_hex}\$/ && \$4 ~ /0A/) {rc=0; exit}} END {exit rc}" && + break + sleep 0.1 + done +} + +do_transfer() +{ + local listener_ns="$1" + local connector_ns="$2" + local cl_proto="$3" + local srv_proto="$4" + local connect_addr="$5" + local local_addr="$6" + local extra_args="" + + local port + port=$((10000+$TEST_COUNT)) + TEST_COUNT=$((TEST_COUNT+1)) + + if [ "$sndbuf" -gt 0 ]; then + extra_args="$extra_args -b $sndbuf" + fi + + if [ -n "$testmode" ]; then + extra_args="$extra_args -m $testmode" + fi + + if [ -n "$extra_args" ] && $options_log; then + options_log=false + echo "INFO: extra options: $extra_args" + fi + + :> "$cout" + :> "$sout" + :> "$capout" + + local addr_port + addr_port=$(printf "%s:%d" ${connect_addr} ${port}) + printf "%.3s %-5s -> %.3s (%-20s) %-5s\t" ${connector_ns} ${cl_proto} ${listener_ns} ${addr_port} ${srv_proto} + + if $capture; then + local capuser + if [ -z $SUDO_USER ] ; then + capuser="" + else + capuser="-Z $SUDO_USER" + fi + + local capfile="${listener_ns}-${connector_ns}-${cl_proto}-${srv_proto}-${connect_addr}.pcap" + + ip netns exec ${listener_ns} tcpdump -i any -s 65535 -B 32768 $capuser -w $capfile > "$capout" 2>&1 & + local cappid=$! + + sleep 1 + fi + + ip netns exec ${listener_ns} ./mptcp_connect -t $timeout -l -p $port -s ${srv_proto} $extra_args $local_addr < "$sin" > "$sout" & + local spid=$! + + wait_local_port_listen "${listener_ns}" "${port}" + + local start + start=$(date +%s%3N) + ip netns exec ${connector_ns} ./mptcp_connect -t $timeout -p $port -s ${cl_proto} $extra_args $connect_addr < "$cin" > "$cout" & + local cpid=$! + + wait $cpid + local retc=$? + wait $spid + local rets=$? + + local stop + stop=$(date +%s%3N) + + if $capture; then + sleep 1 + kill $cappid + fi + + local duration + duration=$((stop-start)) + duration=$(printf "(duration %05sms)" $duration) + if [ ${rets} -ne 0 ] || [ ${retc} -ne 0 ]; then + echo "$duration [ FAIL ] client exit code $retc, server $rets" 1>&2 + echo "\nnetns ${listener_ns} socket stat for $port:" 1>&2 + ip netns exec ${listener_ns} ss -nita 1>&2 -o "sport = :$port" + echo "\nnetns ${connector_ns} socket stat for $port:" 1>&2 + ip netns exec ${connector_ns} ss -nita 1>&2 -o "dport = :$port" + + cat "$capout" + return 1 + fi + + check_transfer $sin $cout "file received by client" + retc=$? + check_transfer $cin $sout "file received by server" + rets=$? + + if [ $retc -eq 0 ] && [ $rets -eq 0 ];then + echo "$duration [ OK ]" + cat "$capout" + return 0 + fi + + cat "$capout" + return 1 +} + +make_file() +{ + local name=$1 + local who=$2 + + local SIZE TSIZE + SIZE=$((RANDOM % (1024 * 8))) + TSIZE=$((SIZE * 1024)) + + dd if=/dev/urandom of="$name" bs=1024 count=$SIZE 2> /dev/null + + SIZE=$((RANDOM % 1024)) + SIZE=$((SIZE + 128)) + TSIZE=$((TSIZE + SIZE)) + dd if=/dev/urandom conv=notrunc of="$name" bs=1 count=$SIZE 2> /dev/null + echo -e "\nMPTCP_TEST_FILE_END_MARKER" >> "$name" + + echo "Created $name (size $TSIZE) containing data sent by $who" +} + +run_tests_lo() +{ + local listener_ns="$1" + local connector_ns="$2" + local connect_addr="$3" + local loopback="$4" + local lret=0 + + # skip if test programs are running inside same netns for subsequent runs. + if [ $loopback -eq 0 ] && [ ${listener_ns} = ${connector_ns} ]; then + return 0 + fi + + # skip if we don't want v6 + if ! $ipv6 && is_v6 "${connect_addr}"; then + return 0 + fi + + local local_addr + if is_v6 "${connect_addr}"; then + local_addr="::" + else + local_addr="0.0.0.0" + fi + + do_transfer ${listener_ns} ${connector_ns} MPTCP MPTCP ${connect_addr} ${local_addr} + lret=$? + if [ $lret -ne 0 ]; then + ret=$lret + return 1 + fi + + # don't bother testing fallback tcp except for loopback case. + if [ ${listener_ns} != ${connector_ns} ]; then + return 0 + fi + + do_transfer ${listener_ns} ${connector_ns} MPTCP TCP ${connect_addr} ${local_addr} + lret=$? + if [ $lret -ne 0 ]; then + ret=$lret + return 1 + fi + + do_transfer ${listener_ns} ${connector_ns} TCP MPTCP ${connect_addr} ${local_addr} + lret=$? + if [ $lret -ne 0 ]; then + ret=$lret + return 1 + fi + + return 0 +} + +run_tests() +{ + run_tests_lo $1 $2 $3 0 +} + +make_file "$cin" "client" +make_file "$sin" "server" + +check_mptcp_disabled + +check_mptcp_ulp_setsockopt + +echo "INFO: validating network environment with pings" +for sender in "$ns1" "$ns2" "$ns3" "$ns4";do + do_ping "$ns1" $sender 10.0.1.1 + do_ping "$ns1" $sender dead:beef:1::1 + + do_ping "$ns2" $sender 10.0.1.2 + do_ping "$ns2" $sender dead:beef:1::2 + do_ping "$ns2" $sender 10.0.2.1 + do_ping "$ns2" $sender dead:beef:2::1 + + do_ping "$ns3" $sender 10.0.2.2 + do_ping "$ns3" $sender dead:beef:2::2 + do_ping "$ns3" $sender 10.0.3.2 + do_ping "$ns3" $sender dead:beef:3::2 + + do_ping "$ns4" $sender 10.0.3.1 + do_ping "$ns4" $sender dead:beef:3::1 +done + +[ -n "$tc_loss" ] && tc -net "$ns2" qdisc add dev ns2eth3 root netem loss random $tc_loss +echo -n "INFO: Using loss of $tc_loss " +test "$tc_delay" -gt 0 && echo -n "delay $tc_delay ms " + +if [ -z "${tc_reorder}" ]; then + reorder1=$((RANDOM%10)) + reorder1=$((100 - reorder1)) + reorder2=$((RANDOM%100)) + + if [ $tc_delay -gt 0 ] && [ $reorder1 -lt 100 ] && [ $reorder2 -gt 0 ]; then + tc_reorder="reorder ${reorder1}% ${reorder2}%" + echo -n "$tc_reorder " + fi +elif [ "$tc_reorder" = "0" ];then + tc_reorder="" +elif [ "$tc_delay" -gt 0 ];then + # reordering requires some delay + tc_reorder="reorder $tc_reorder" + echo -n "$tc_reorder " +fi + +echo "on ns3eth4" + +tc -net "$ns3" qdisc add dev ns3eth4 root netem delay ${tc_delay}ms $tc_reorder + +for sender in $ns1 $ns2 $ns3 $ns4;do + run_tests_lo "$ns1" "$sender" 10.0.1.1 1 + if [ $ret -ne 0 ] ;then + echo "FAIL: Could not even run loopback test" 1>&2 + exit $ret + fi + run_tests_lo "$ns1" $sender dead:beef:1::1 1 + if [ $ret -ne 0 ] ;then + echo "FAIL: Could not even run loopback v6 test" 2>&1 + exit $ret + fi + + run_tests "$ns2" $sender 10.0.1.2 + run_tests "$ns2" $sender dead:beef:1::2 + run_tests "$ns2" $sender 10.0.2.1 + run_tests "$ns2" $sender dead:beef:2::1 + + run_tests "$ns3" $sender 10.0.2.2 + run_tests "$ns3" $sender dead:beef:2::2 + run_tests "$ns3" $sender 10.0.3.2 + run_tests "$ns3" $sender dead:beef:3::2 + + run_tests "$ns4" $sender 10.0.3.1 + run_tests "$ns4" $sender dead:beef:3::1 +done + +time_end=$(date +%s) +time_run=$((time_end-time_start)) + +echo "Time: ${time_run} seconds" + +exit $ret diff --git a/tools/testing/selftests/net/mptcp/settings b/tools/testing/selftests/net/mptcp/settings new file mode 100644 index 000000000000..026384c189c9 --- /dev/null +++ b/tools/testing/selftests/net/mptcp/settings @@ -0,0 +1 @@ +timeout=450 From patchwork Wed Jan 22 00:56:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Paasch X-Patchwork-Id: 1226855 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=198.145.21.10; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=quarantine dis=none) header.from=apple.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=apple.com header.i=@apple.com header.a=rsa-sha256 header.s=20180706 header.b=T4J5d3c/; dkim-atps=neutral Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 482Rnd37G0z9sRG for ; Wed, 22 Jan 2020 11:57:09 +1100 (AEDT) Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 13CE01007B8EA; Tue, 21 Jan 2020 17:00:26 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=17.151.62.67; helo=nwk-aaemail-lapp02.apple.com; envelope-from=cpaasch@apple.com; receiver= Received: from nwk-aaemail-lapp02.apple.com (nwk-aaemail-lapp02.apple.com [17.151.62.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 78CE81007B8C8 for ; Tue, 21 Jan 2020 17:00:23 -0800 (PST) Received: from pps.filterd (nwk-aaemail-lapp02.apple.com [127.0.0.1]) by nwk-aaemail-lapp02.apple.com (8.16.0.27/8.16.0.27) with SMTP id 00M0v2eL020009; Tue, 21 Jan 2020 16:57:04 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=apple.com; h=sender : from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=20180706; bh=ZhG+UGDhCdTsG2sDTCZShDaAkDWJx+7R0dzdMclbDMM=; b=T4J5d3c/5Jx9yS+GiIQj6QjUA3R2JuWRE40fyy1EcrJpNiUtJ5w2XlaLSOZxT7EzXc3k EmDtvV7u2YqZjv4OTeg+Ikmpxocy08gtZisdiW5+K6NVxAzoXIy/Z5yLPrie8p2LcnnL eVj2OD//7PpFajcaFjkDg5RC/VthbWDHsItXBOVPO5rK3lsrz+AKwTFgHyCMpyxXLOwm z8RwN40/Jz+cFm8y/btzRcJumULU2/ofJKKnNT73v1VwPODYrtNJBdpWl5hOdTyFY+1t v9vh9tqk2ghOrvLvifb2Xuo1AKwP+Df7BttLpj/m4PDxplX0yP6Vc1Qdzcqi93O8f1/A +w== Received: from ma1-mtap-s03.corp.apple.com (ma1-mtap-s03.corp.apple.com [17.40.76.7]) by nwk-aaemail-lapp02.apple.com with ESMTP id 2xkyfq8e4a-8 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Tue, 21 Jan 2020 16:57:04 -0800 Received: from nwk-mmpp-sz13.apple.com (nwk-mmpp-sz13.apple.com [17.128.115.216]) by ma1-mtap-s03.corp.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPS id <0Q4H00064HAZD600@ma1-mtap-s03.corp.apple.com>; Tue, 21 Jan 2020 16:57:03 -0800 (PST) Received: from process_milters-daemon.nwk-mmpp-sz13.apple.com by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) id <0Q4H00F00F5G4K00@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:57:02 -0800 (PST) X-Va-A: X-Va-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-Va-E-CD: 8e11cccf87192e4df3f407d43544ff16 X-Va-R-CD: b9324b3a4efeb48edefc2f9dfbbde0b0 X-Va-CD: 0 X-Va-ID: 37a035fe-7ae2-475d-81be-617b25350902 X-V-A: X-V-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-V-E-CD: 8e11cccf87192e4df3f407d43544ff16 X-V-R-CD: b9324b3a4efeb48edefc2f9dfbbde0b0 X-V-CD: 0 X-V-ID: 1c722b40-b1c9-43de-a0c6-1197564082b4 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2020-01-17_05:,, signatures=0 Received: from localhost ([17.192.155.241]) by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPSA id <0Q4H00DSWHB1DC30@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:57:01 -0800 (PST) Sender: cpaasch@apple.com From: Christoph Paasch To: netdev@vger.kernel.org Date: Tue, 21 Jan 2020 16:56:30 -0800 Message-id: <20200122005633.21229-17-cpaasch@apple.com> X-Mailer: git-send-email 2.23.0 In-reply-to: <20200122005633.21229-1-cpaasch@apple.com> References: <20200122005633.21229-1-cpaasch@apple.com> MIME-version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2020-01-17_05:, , signatures=0 Message-ID-Hash: MFOOMAMNHCXTEGNNDOWV6DRTVYK2UFEF X-Message-ID-Hash: MFOOMAMNHCXTEGNNDOWV6DRTVYK2UFEF X-MailFrom: cpaasch@apple.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: mptcp@lists.01.org X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH net-next v3 16/19] mptcp: move from sha1 (v0) to sha256 (v1) List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: From: Paolo Abeni For simplicity's sake use directly sha256 primitives (and pull them as a required build dep). Add optional, boot-time self-tests for the hmac function. Signed-off-by: Paolo Abeni Signed-off-by: Christoph Paasch --- net/mptcp/Kconfig | 10 ++++ net/mptcp/crypto.c | 140 ++++++++++++++++++++++++++----------------- net/mptcp/options.c | 9 ++- net/mptcp/protocol.h | 4 +- 4 files changed, 104 insertions(+), 59 deletions(-) diff --git a/net/mptcp/Kconfig b/net/mptcp/Kconfig index c1a99f07a4cd..5db56d2218c5 100644 --- a/net/mptcp/Kconfig +++ b/net/mptcp/Kconfig @@ -3,6 +3,7 @@ config MPTCP bool "MPTCP: Multipath TCP" depends on INET select SKB_EXTENSIONS + select CRYPTO_LIB_SHA256 help Multipath TCP (MPTCP) connections send and receive data over multiple subflows in order to utilize multiple network paths. Each subflow @@ -14,3 +15,12 @@ config MPTCP_IPV6 depends on MPTCP select IPV6 default y + +config MPTCP_HMAC_TEST + bool "Tests for MPTCP HMAC implementation" + default n + help + This option enable boot time self-test for the HMAC implementation + used by the MPTCP code + + Say N if you are unsure. diff --git a/net/mptcp/crypto.c b/net/mptcp/crypto.c index bbd6d01af211..40d1bb18fd60 100644 --- a/net/mptcp/crypto.c +++ b/net/mptcp/crypto.c @@ -21,67 +21,36 @@ */ #include -#include +#include #include #include "protocol.h" -struct sha1_state { - u32 workspace[SHA_WORKSPACE_WORDS]; - u32 digest[SHA_DIGEST_WORDS]; - unsigned int count; -}; - -static void sha1_init(struct sha1_state *state) -{ - sha_init(state->digest); - state->count = 0; -} - -static void sha1_update(struct sha1_state *state, u8 *input) -{ - sha_transform(state->digest, input, state->workspace); - state->count += SHA_MESSAGE_BYTES; -} - -static void sha1_pad_final(struct sha1_state *state, u8 *input, - unsigned int length, __be32 *mptcp_hashed_key) -{ - int i; - - input[length] = 0x80; - memset(&input[length + 1], 0, SHA_MESSAGE_BYTES - length - 9); - put_unaligned_be64((length + state->count) << 3, - &input[SHA_MESSAGE_BYTES - 8]); - - sha_transform(state->digest, input, state->workspace); - for (i = 0; i < SHA_DIGEST_WORDS; ++i) - put_unaligned_be32(state->digest[i], &mptcp_hashed_key[i]); - - memzero_explicit(state->workspace, SHA_WORKSPACE_WORDS << 2); -} +#define SHA256_DIGEST_WORDS (SHA256_DIGEST_SIZE / 4) void mptcp_crypto_key_sha(u64 key, u32 *token, u64 *idsn) { - __be32 mptcp_hashed_key[SHA_DIGEST_WORDS]; - u8 input[SHA_MESSAGE_BYTES]; - struct sha1_state state; + __be32 mptcp_hashed_key[SHA256_DIGEST_WORDS]; + __be64 input = cpu_to_be64(key); + struct sha256_state state; - sha1_init(&state); - put_unaligned_be64(key, input); - sha1_pad_final(&state, input, 8, mptcp_hashed_key); + sha256_init(&state); + sha256_update(&state, (__force u8 *)&input, sizeof(input)); + sha256_final(&state, (u8 *)mptcp_hashed_key); if (token) *token = be32_to_cpu(mptcp_hashed_key[0]); if (idsn) - *idsn = be64_to_cpu(*((__be64 *)&mptcp_hashed_key[3])); + *idsn = be64_to_cpu(*((__be64 *)&mptcp_hashed_key[6])); } void mptcp_crypto_hmac_sha(u64 key1, u64 key2, u32 nonce1, u32 nonce2, - u32 *hash_out) + void *hmac) { - u8 input[SHA_MESSAGE_BYTES * 2]; - struct sha1_state state; + u8 input[SHA256_BLOCK_SIZE + SHA256_DIGEST_SIZE]; + __be32 mptcp_hashed_key[SHA256_DIGEST_WORDS]; + __be32 *hash_out = (__force __be32 *)hmac; + struct sha256_state state; u8 key1be[8]; u8 key2be[8]; int i; @@ -96,17 +65,16 @@ void mptcp_crypto_hmac_sha(u64 key1, u64 key2, u32 nonce1, u32 nonce2, for (i = 0; i < 8; i++) input[i + 8] ^= key2be[i]; - put_unaligned_be32(nonce1, &input[SHA_MESSAGE_BYTES]); - put_unaligned_be32(nonce2, &input[SHA_MESSAGE_BYTES + 4]); + put_unaligned_be32(nonce1, &input[SHA256_BLOCK_SIZE]); + put_unaligned_be32(nonce2, &input[SHA256_BLOCK_SIZE + 4]); - sha1_init(&state); - sha1_update(&state, input); + sha256_init(&state); + sha256_update(&state, input, SHA256_BLOCK_SIZE + 8); /* emit sha256(K1 || msg) on the second input block, so we can * reuse 'input' for the last hashing */ - sha1_pad_final(&state, &input[SHA_MESSAGE_BYTES], 8, - (__force __be32 *)&input[SHA_MESSAGE_BYTES]); + sha256_final(&state, &input[SHA256_BLOCK_SIZE]); /* Prepare second part of hmac */ memset(input, 0x5C, SHA_MESSAGE_BYTES); @@ -115,8 +83,70 @@ void mptcp_crypto_hmac_sha(u64 key1, u64 key2, u32 nonce1, u32 nonce2, for (i = 0; i < 8; i++) input[i + 8] ^= key2be[i]; - sha1_init(&state); - sha1_update(&state, input); - sha1_pad_final(&state, &input[SHA_MESSAGE_BYTES], SHA_DIGEST_WORDS << 2, - (__be32 *)hash_out); + sha256_init(&state); + sha256_update(&state, input, SHA256_BLOCK_SIZE + SHA256_DIGEST_SIZE); + sha256_final(&state, (u8 *)mptcp_hashed_key); + + /* takes only first 160 bits */ + for (i = 0; i < 5; i++) + hash_out[i] = mptcp_hashed_key[i]; +} + +#ifdef CONFIG_MPTCP_HMAC_TEST +struct test_cast { + char *key; + char *msg; + char *result; +}; + +/* we can't reuse RFC 4231 test vectors, as we have constraint on the + * input and key size, and we truncate the output. + */ +static struct test_cast tests[] = { + { + .key = "0b0b0b0b0b0b0b0b", + .msg = "48692054", + .result = "8385e24fb4235ac37556b6b886db106284a1da67", + }, + { + .key = "aaaaaaaaaaaaaaaa", + .msg = "dddddddd", + .result = "2c5e219164ff1dca1c4a92318d847bb6b9d44492", + }, + { + .key = "0102030405060708", + .msg = "cdcdcdcd", + .result = "e73b9ba9969969cefb04aa0d6df18ec2fcc075b6", + }, +}; + +static int __init test_mptcp_crypto(void) +{ + char hmac[20], hmac_hex[41]; + u32 nonce1, nonce2; + u64 key1, key2; + int i, j; + + for (i = 0; i < ARRAY_SIZE(tests); ++i) { + /* mptcp hmap will convert to be before computing the hmac */ + key1 = be64_to_cpu(*((__be64 *)&tests[i].key[0])); + key2 = be64_to_cpu(*((__be64 *)&tests[i].key[8])); + nonce1 = be32_to_cpu(*((__be32 *)&tests[i].msg[0])); + nonce2 = be32_to_cpu(*((__be32 *)&tests[i].msg[4])); + + mptcp_crypto_hmac_sha(key1, key2, nonce1, nonce2, hmac); + for (j = 0; j < 20; ++j) + sprintf(&hmac_hex[j << 1], "%02x", hmac[j] & 0xff); + hmac_hex[40] = 0; + + if (memcmp(hmac_hex, tests[i].result, 40)) + pr_err("test %d failed, got %s expected %s", i, + hmac_hex, tests[i].result); + else + pr_info("test %d [ ok ]", i); + } + return 0; } + +late_initcall(test_mptcp_crypto); +#endif diff --git a/net/mptcp/options.c b/net/mptcp/options.c index 1fa8496f3551..1aec742ca8e1 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -9,6 +9,11 @@ #include #include "protocol.h" +static bool mptcp_cap_flag_sha256(u8 flags) +{ + return (flags & MPTCP_CAP_FLAG_MASK) == MPTCP_CAP_HMAC_SHA256; +} + void mptcp_parse_option(const unsigned char *ptr, int opsize, struct tcp_options_received *opt_rx) { @@ -29,7 +34,7 @@ void mptcp_parse_option(const unsigned char *ptr, int opsize, break; flags = *ptr++; - if (!((flags & MPTCP_CAP_FLAG_MASK) == MPTCP_CAP_HMAC_SHA1) || + if (!mptcp_cap_flag_sha256(flags) || (flags & MPTCP_CAP_EXTENSIBILITY)) break; @@ -399,7 +404,7 @@ void mptcp_write_options(__be32 *ptr, struct mptcp_out_options *opts) *ptr++ = htonl((TCPOPT_MPTCP << 24) | (len << 16) | (MPTCPOPT_MP_CAPABLE << 12) | (MPTCP_SUPPORTED_VERSION << 8) | - MPTCP_CAP_HMAC_SHA1); + MPTCP_CAP_HMAC_SHA256); put_unaligned_be64(opts->sndr_key, ptr); ptr += 2; if (OPTION_MPTCP_MPC_ACK & opts->suboptions) { diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index a8fad7d78565..a355bb1cf31b 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -43,7 +43,7 @@ #define MPTCP_VERSION_MASK (0x0F) #define MPTCP_CAP_CHECKSUM_REQD BIT(7) #define MPTCP_CAP_EXTENSIBILITY BIT(6) -#define MPTCP_CAP_HMAC_SHA1 BIT(0) +#define MPTCP_CAP_HMAC_SHA256 BIT(0) #define MPTCP_CAP_FLAG_MASK (0x3F) /* MPTCP DSS flags */ @@ -216,7 +216,7 @@ static inline void mptcp_crypto_key_gen_sha(u64 *key, u32 *token, u64 *idsn) } void mptcp_crypto_hmac_sha(u64 key1, u64 key2, u32 nonce1, u32 nonce2, - u32 *hash_out); + void *hash_out); static inline struct mptcp_ext *mptcp_get_ext(struct sk_buff *skb) { From patchwork Wed Jan 22 00:56:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Paasch X-Patchwork-Id: 1226863 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=198.145.21.10; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=quarantine dis=none) header.from=apple.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=apple.com header.i=@apple.com header.a=rsa-sha256 header.s=20180706 header.b=PghATJ3u; dkim-atps=neutral Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 482Rnl17Dmz9sRR for ; Wed, 22 Jan 2020 11:57:15 +1100 (AEDT) Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 6FB061007B8E1; Tue, 21 Jan 2020 17:00:31 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=17.171.2.72; helo=ma1-aaemail-dr-lapp03.apple.com; envelope-from=cpaasch@apple.com; receiver= Received: from ma1-aaemail-dr-lapp03.apple.com (ma1-aaemail-dr-lapp03.apple.com [17.171.2.72]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 27E181007B8E5 for ; Tue, 21 Jan 2020 17:00:29 -0800 (PST) Received: from pps.filterd (ma1-aaemail-dr-lapp03.apple.com [127.0.0.1]) by ma1-aaemail-dr-lapp03.apple.com (8.16.0.27/8.16.0.27) with SMTP id 00M0uwKM005654; Tue, 21 Jan 2020 16:57:10 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=apple.com; h=sender : from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=20180706; bh=jHbtYPKB02MDk+pZNa5Dv3MTI48aEjsOJLrfLpW7Be0=; b=PghATJ3u9imfdmg/mFLvZoL4TPRyXUIOt5ZK9I2Xm9UNk14LK2LZQWB7YT7iVQ8bu7/2 8GULynEI19Ty6lL88dXP1ZswHbLNcFaK6Nr6GiYCMbZnZTo1aefQaxjm3cvJDpPljNSz MyxegpNIAs0xr/cfS4PFuavCPUxbjunLKVuOdeUh9luOMWaujvG6Soir9tLscJ5d5W/O Oo/9BZ3Q++Ielpw3wVeJhavGvCt1NAPBwerrf39IJrZjl8VWzhpecy+NBf93XbnGifKX pSWQFrToNxWUg1emPP0zx7bv6gokfMilBK4zFol/YkMFdtHVpYwH032RVM2gjuw/p7bK WA== Received: from ma1-mtap-s02.corp.apple.com (ma1-mtap-s02.corp.apple.com [17.40.76.6]) by ma1-aaemail-dr-lapp03.apple.com with ESMTP id 2xm1w0q03h-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Tue, 21 Jan 2020 16:57:10 -0800 Received: from nwk-mmpp-sz13.apple.com (nwk-mmpp-sz13.apple.com [17.128.115.216]) by ma1-mtap-s02.corp.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPS id <0Q4H0018HHB4F030@ma1-mtap-s02.corp.apple.com>; Tue, 21 Jan 2020 16:57:05 -0800 (PST) Received: from process_milters-daemon.nwk-mmpp-sz13.apple.com by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) id <0Q4H00F00F5G4K00@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:57:05 -0800 (PST) X-Va-A: X-Va-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-Va-E-CD: f18f4596b6dc374f82180780baaa769d X-Va-R-CD: 66696f38f2e4b6d938193453d7cca073 X-Va-CD: 0 X-Va-ID: a3f0620d-6318-4a19-8ca2-bdaf196f6b8d X-V-A: X-V-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-V-E-CD: f18f4596b6dc374f82180780baaa769d X-V-R-CD: 66696f38f2e4b6d938193453d7cca073 X-V-CD: 0 X-V-ID: cdf826c6-bf5c-474d-a051-a9ebc3155ec5 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2020-01-17_05:,, signatures=0 Received: from localhost ([17.192.155.241]) by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPSA id <0Q4H0012HHB14Y50@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:57:01 -0800 (PST) Sender: cpaasch@apple.com From: Christoph Paasch To: netdev@vger.kernel.org Date: Tue, 21 Jan 2020 16:56:31 -0800 Message-id: <20200122005633.21229-18-cpaasch@apple.com> X-Mailer: git-send-email 2.23.0 In-reply-to: <20200122005633.21229-1-cpaasch@apple.com> References: <20200122005633.21229-1-cpaasch@apple.com> MIME-version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2020-01-17_05:, , signatures=0 Message-ID-Hash: SE5JN4CCPTLOZ5MNECDG2AVLOUPFGKEF X-Message-ID-Hash: SE5JN4CCPTLOZ5MNECDG2AVLOUPFGKEF X-MailFrom: cpaasch@apple.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: mptcp@lists.01.org X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH net-next v3 17/19] mptcp: parse and emit MP_CAPABLE option according to v1 spec List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: This implements MP_CAPABLE options parsing and writing according to RFC 6824 bis / RFC 8684: MPTCP v1. Local key is sent on syn/ack, and both keys are sent on 3rd ack. MP_CAPABLE messages len are updated accordingly. We need the skbuff to correctly emit the above, so we push the skbuff struct as an argument all the way from tcp code to the relevant mptcp callbacks. When processing incoming MP_CAPABLE + data, build a full blown DSS-like map info, to simplify later processing. On child socket creation, we need to record the remote key, if available. Signed-off-by: Christoph Paasch --- include/linux/tcp.h | 3 +- include/net/mptcp.h | 17 +++-- net/ipv4/tcp_input.c | 2 +- net/ipv4/tcp_output.c | 2 +- net/mptcp/options.c | 162 +++++++++++++++++++++++++++++++++--------- net/mptcp/protocol.h | 6 +- net/mptcp/subflow.c | 14 +++- 7 files changed, 160 insertions(+), 46 deletions(-) diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 0d00dad4b85d..4e2124607d32 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -94,7 +94,8 @@ struct mptcp_options_received { data_fin:1, use_ack:1, ack64:1, - __unused:3; + mpc_map:1, + __unused:2; }; #endif diff --git a/include/net/mptcp.h b/include/net/mptcp.h index 8619c1fca741..27627e2d1bc2 100644 --- a/include/net/mptcp.h +++ b/include/net/mptcp.h @@ -23,7 +23,8 @@ struct mptcp_ext { data_fin:1, use_ack:1, ack64:1, - __unused:3; + mpc_map:1, + __unused:2; /* one byte hole */ }; @@ -50,10 +51,10 @@ static inline bool rsk_is_mptcp(const struct request_sock *req) return tcp_rsk(req)->is_mptcp; } -void mptcp_parse_option(const unsigned char *ptr, int opsize, - struct tcp_options_received *opt_rx); -bool mptcp_syn_options(struct sock *sk, unsigned int *size, - struct mptcp_out_options *opts); +void mptcp_parse_option(const struct sk_buff *skb, const unsigned char *ptr, + int opsize, struct tcp_options_received *opt_rx); +bool mptcp_syn_options(struct sock *sk, const struct sk_buff *skb, + unsigned int *size, struct mptcp_out_options *opts); void mptcp_rcv_synsent(struct sock *sk); bool mptcp_synack_options(const struct request_sock *req, unsigned int *size, struct mptcp_out_options *opts); @@ -121,12 +122,14 @@ static inline bool rsk_is_mptcp(const struct request_sock *req) return false; } -static inline void mptcp_parse_option(const unsigned char *ptr, int opsize, +static inline void mptcp_parse_option(const struct sk_buff *skb, + const unsigned char *ptr, int opsize, struct tcp_options_received *opt_rx) { } -static inline bool mptcp_syn_options(struct sock *sk, unsigned int *size, +static inline bool mptcp_syn_options(struct sock *sk, const struct sk_buff *skb, + unsigned int *size, struct mptcp_out_options *opts) { return false; diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 28d31f2c1422..2f475b897c11 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3926,7 +3926,7 @@ void tcp_parse_options(const struct net *net, break; #endif case TCPOPT_MPTCP: - mptcp_parse_option(ptr, opsize, opt_rx); + mptcp_parse_option(skb, ptr, opsize, opt_rx); break; case TCPOPT_FASTOPEN: diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index c6797a388921..213729ebf323 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -685,7 +685,7 @@ static unsigned int tcp_syn_options(struct sock *sk, struct sk_buff *skb, if (sk_is_mptcp(sk)) { unsigned int size; - if (mptcp_syn_options(sk, &size, &opts->mptcp)) { + if (mptcp_syn_options(sk, skb, &size, &opts->mptcp)) { opts->options |= OPTION_MPTCP; remaining -= size; } diff --git a/net/mptcp/options.c b/net/mptcp/options.c index 1aec742ca8e1..8f82ff9a5a8e 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -14,8 +14,8 @@ static bool mptcp_cap_flag_sha256(u8 flags) return (flags & MPTCP_CAP_FLAG_MASK) == MPTCP_CAP_HMAC_SHA256; } -void mptcp_parse_option(const unsigned char *ptr, int opsize, - struct tcp_options_received *opt_rx) +void mptcp_parse_option(const struct sk_buff *skb, const unsigned char *ptr, + int opsize, struct tcp_options_received *opt_rx) { struct mptcp_options_received *mp_opt = &opt_rx->mptcp; u8 subtype = *ptr >> 4; @@ -25,13 +25,29 @@ void mptcp_parse_option(const unsigned char *ptr, int opsize, switch (subtype) { case MPTCPOPT_MP_CAPABLE: - if (opsize != TCPOLEN_MPTCP_MPC_SYN && - opsize != TCPOLEN_MPTCP_MPC_ACK) + /* strict size checking */ + if (!(TCP_SKB_CB(skb)->tcp_flags & TCPHDR_SYN)) { + if (skb->len > tcp_hdr(skb)->doff << 2) + expected_opsize = TCPOLEN_MPTCP_MPC_ACK_DATA; + else + expected_opsize = TCPOLEN_MPTCP_MPC_ACK; + } else { + if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_ACK) + expected_opsize = TCPOLEN_MPTCP_MPC_SYNACK; + else + expected_opsize = TCPOLEN_MPTCP_MPC_SYN; + } + if (opsize != expected_opsize) break; + /* try to be gentle vs future versions on the initial syn */ version = *ptr++ & MPTCP_VERSION_MASK; - if (version != MPTCP_SUPPORTED_VERSION) + if (opsize != TCPOLEN_MPTCP_MPC_SYN) { + if (version != MPTCP_SUPPORTED_VERSION) + break; + } else if (version < MPTCP_SUPPORTED_VERSION) { break; + } flags = *ptr++; if (!mptcp_cap_flag_sha256(flags) || @@ -55,23 +71,40 @@ void mptcp_parse_option(const unsigned char *ptr, int opsize, break; mp_opt->mp_capable = 1; - mp_opt->sndr_key = get_unaligned_be64(ptr); - ptr += 8; - - if (opsize == TCPOLEN_MPTCP_MPC_ACK) { + if (opsize >= TCPOLEN_MPTCP_MPC_SYNACK) { + mp_opt->sndr_key = get_unaligned_be64(ptr); + ptr += 8; + } + if (opsize >= TCPOLEN_MPTCP_MPC_ACK) { mp_opt->rcvr_key = get_unaligned_be64(ptr); ptr += 8; - pr_debug("MP_CAPABLE sndr=%llu, rcvr=%llu", - mp_opt->sndr_key, mp_opt->rcvr_key); - } else { - pr_debug("MP_CAPABLE sndr=%llu", mp_opt->sndr_key); } + if (opsize == TCPOLEN_MPTCP_MPC_ACK_DATA) { + /* Section 3.1.: + * "the data parameters in a MP_CAPABLE are semantically + * equivalent to those in a DSS option and can be used + * interchangeably." + */ + mp_opt->dss = 1; + mp_opt->use_map = 1; + mp_opt->mpc_map = 1; + mp_opt->data_len = get_unaligned_be16(ptr); + ptr += 2; + } + pr_debug("MP_CAPABLE version=%x, flags=%x, optlen=%d sndr=%llu, rcvr=%llu len=%d", + version, flags, opsize, mp_opt->sndr_key, + mp_opt->rcvr_key, mp_opt->data_len); break; case MPTCPOPT_DSS: pr_debug("DSS"); ptr++; + /* we must clear 'mpc_map' be able to detect MP_CAPABLE + * map vs DSS map in mptcp_incoming_options(), and reconstruct + * map info accordingly + */ + mp_opt->mpc_map = 0; flags = (*ptr++) & MPTCP_DSS_FLAG_MASK; mp_opt->data_fin = (flags & MPTCP_DSS_DATA_FIN) != 0; mp_opt->dsn64 = (flags & MPTCP_DSS_DSN64) != 0; @@ -176,18 +209,22 @@ void mptcp_get_options(const struct sk_buff *skb, if (opsize > length) return; /* don't parse partial options */ if (opcode == TCPOPT_MPTCP) - mptcp_parse_option(ptr, opsize, opt_rx); + mptcp_parse_option(skb, ptr, opsize, opt_rx); ptr += opsize - 2; length -= opsize; } } } -bool mptcp_syn_options(struct sock *sk, unsigned int *size, - struct mptcp_out_options *opts) +bool mptcp_syn_options(struct sock *sk, const struct sk_buff *skb, + unsigned int *size, struct mptcp_out_options *opts) { struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk); + /* we will use snd_isn to detect first pkt [re]transmission + * in mptcp_established_options_mp() + */ + subflow->snd_isn = TCP_SKB_CB(skb)->end_seq; if (subflow->request_mptcp) { pr_debug("local_key=%llu", subflow->local_key); opts->suboptions = OPTION_MPTCP_MPC_SYN; @@ -212,20 +249,52 @@ void mptcp_rcv_synsent(struct sock *sk) } } -static bool mptcp_established_options_mp(struct sock *sk, unsigned int *size, +static bool mptcp_established_options_mp(struct sock *sk, struct sk_buff *skb, + unsigned int *size, unsigned int remaining, struct mptcp_out_options *opts) { struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk); + struct mptcp_ext *mpext; + unsigned int data_len; + + pr_debug("subflow=%p fourth_ack=%d seq=%x:%x remaining=%d", subflow, + subflow->fourth_ack, subflow->snd_isn, + skb ? TCP_SKB_CB(skb)->seq : 0, remaining); + + if (subflow->mp_capable && !subflow->fourth_ack && skb && + subflow->snd_isn == TCP_SKB_CB(skb)->seq) { + /* When skb is not available, we better over-estimate the + * emitted options len. A full DSS option is longer than + * TCPOLEN_MPTCP_MPC_ACK_DATA, so let's the caller try to fit + * that. + */ + mpext = mptcp_get_ext(skb); + data_len = mpext ? mpext->data_len : 0; - if (!subflow->fourth_ack) { + /* we will check ext_copy.data_len in mptcp_write_options() to + * discriminate between TCPOLEN_MPTCP_MPC_ACK_DATA and + * TCPOLEN_MPTCP_MPC_ACK + */ + opts->ext_copy.data_len = data_len; opts->suboptions = OPTION_MPTCP_MPC_ACK; opts->sndr_key = subflow->local_key; opts->rcvr_key = subflow->remote_key; - *size = TCPOLEN_MPTCP_MPC_ACK; - subflow->fourth_ack = 1; - pr_debug("subflow=%p, local_key=%llu, remote_key=%llu", - subflow, subflow->local_key, subflow->remote_key); + + /* Section 3.1. + * The MP_CAPABLE option is carried on the SYN, SYN/ACK, and ACK + * packets that start the first subflow of an MPTCP connection, + * as well as the first packet that carries data + */ + if (data_len > 0) + *size = ALIGN(TCPOLEN_MPTCP_MPC_ACK_DATA, 4); + else + *size = TCPOLEN_MPTCP_MPC_ACK; + + pr_debug("subflow=%p, local_key=%llu, remote_key=%llu map_len=%d", + subflow, subflow->local_key, subflow->remote_key, + data_len); + return true; } return false; @@ -319,7 +388,7 @@ bool mptcp_established_options(struct sock *sk, struct sk_buff *skb, unsigned int opt_size = 0; bool ret = false; - if (mptcp_established_options_mp(sk, &opt_size, remaining, opts)) + if (mptcp_established_options_mp(sk, skb, &opt_size, remaining, opts)) ret = true; else if (mptcp_established_options_dss(sk, skb, &opt_size, remaining, opts)) @@ -371,11 +440,26 @@ void mptcp_incoming_options(struct sock *sk, struct sk_buff *skb, memset(mpext, 0, sizeof(*mpext)); if (mp_opt->use_map) { - mpext->data_seq = mp_opt->data_seq; - mpext->subflow_seq = mp_opt->subflow_seq; + if (mp_opt->mpc_map) { + struct mptcp_subflow_context *subflow = + mptcp_subflow_ctx(sk); + + /* this is an MP_CAPABLE carrying MPTCP data + * we know this map the first chunk of data + */ + mptcp_crypto_key_sha(subflow->remote_key, NULL, + &mpext->data_seq); + mpext->data_seq++; + mpext->subflow_seq = 1; + mpext->dsn64 = 1; + mpext->mpc_map = 1; + } else { + mpext->data_seq = mp_opt->data_seq; + mpext->subflow_seq = mp_opt->subflow_seq; + mpext->dsn64 = mp_opt->dsn64; + } mpext->data_len = mp_opt->data_len; mpext->use_map = 1; - mpext->dsn64 = mp_opt->dsn64; } if (mp_opt->use_ack) { @@ -389,8 +473,7 @@ void mptcp_incoming_options(struct sock *sk, struct sk_buff *skb, void mptcp_write_options(__be32 *ptr, struct mptcp_out_options *opts) { - if ((OPTION_MPTCP_MPC_SYN | - OPTION_MPTCP_MPC_SYNACK | + if ((OPTION_MPTCP_MPC_SYN | OPTION_MPTCP_MPC_SYNACK | OPTION_MPTCP_MPC_ACK) & opts->suboptions) { u8 len; @@ -398,6 +481,8 @@ void mptcp_write_options(__be32 *ptr, struct mptcp_out_options *opts) len = TCPOLEN_MPTCP_MPC_SYN; else if (OPTION_MPTCP_MPC_SYNACK & opts->suboptions) len = TCPOLEN_MPTCP_MPC_SYNACK; + else if (opts->ext_copy.data_len) + len = TCPOLEN_MPTCP_MPC_ACK_DATA; else len = TCPOLEN_MPTCP_MPC_ACK; @@ -405,14 +490,27 @@ void mptcp_write_options(__be32 *ptr, struct mptcp_out_options *opts) (MPTCPOPT_MP_CAPABLE << 12) | (MPTCP_SUPPORTED_VERSION << 8) | MPTCP_CAP_HMAC_SHA256); + + if (!((OPTION_MPTCP_MPC_SYNACK | OPTION_MPTCP_MPC_ACK) & + opts->suboptions)) + goto mp_capable_done; + put_unaligned_be64(opts->sndr_key, ptr); ptr += 2; - if (OPTION_MPTCP_MPC_ACK & opts->suboptions) { - put_unaligned_be64(opts->rcvr_key, ptr); - ptr += 2; - } + if (!((OPTION_MPTCP_MPC_ACK) & opts->suboptions)) + goto mp_capable_done; + + put_unaligned_be64(opts->rcvr_key, ptr); + ptr += 2; + if (!opts->ext_copy.data_len) + goto mp_capable_done; + + put_unaligned_be32(opts->ext_copy.data_len << 16 | + TCPOPT_NOP << 8 | TCPOPT_NOP, ptr); + ptr += 1; } +mp_capable_done: if (opts->ext_copy.use_ack || opts->ext_copy.use_map) { struct mptcp_ext *mpext = &opts->ext_copy; u8 len = TCPOLEN_MPTCP_DSS_BASE; diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index a355bb1cf31b..36b90024d34d 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -11,7 +11,7 @@ #include #include -#define MPTCP_SUPPORTED_VERSION 0 +#define MPTCP_SUPPORTED_VERSION 1 /* MPTCP option bits */ #define OPTION_MPTCP_MPC_SYN BIT(0) @@ -29,9 +29,10 @@ #define MPTCPOPT_MP_FASTCLOSE 7 /* MPTCP suboption lengths */ -#define TCPOLEN_MPTCP_MPC_SYN 12 +#define TCPOLEN_MPTCP_MPC_SYN 4 #define TCPOLEN_MPTCP_MPC_SYNACK 12 #define TCPOLEN_MPTCP_MPC_ACK 20 +#define TCPOLEN_MPTCP_MPC_ACK_DATA 22 #define TCPOLEN_MPTCP_DSS_BASE 4 #define TCPOLEN_MPTCP_DSS_ACK32 4 #define TCPOLEN_MPTCP_DSS_ACK64 8 @@ -106,6 +107,7 @@ struct mptcp_subflow_context { u64 remote_key; u64 idsn; u64 map_seq; + u32 snd_isn; u32 token; u32 rel_write_seq; u32 map_subflow_seq; diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index 9fb3eb87a20f..8892855f4f52 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -77,7 +77,6 @@ static void subflow_init_req(struct request_sock *req, if (err == 0) subflow_req->mp_capable = 1; - subflow_req->remote_key = rx_opt.mptcp.sndr_key; subflow_req->ssn_offset = TCP_SKB_CB(skb)->seq; } } @@ -180,11 +179,22 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk, bool *own_req) { struct mptcp_subflow_context *listener = mptcp_subflow_ctx(sk); + struct mptcp_subflow_request_sock *subflow_req; + struct tcp_options_received opt_rx; struct sock *child; pr_debug("listener=%p, req=%p, conn=%p", listener, req, listener->conn); - /* if the sk is MP_CAPABLE, we already received the client key */ + /* if the sk is MP_CAPABLE, we need to fetch the client key */ + subflow_req = mptcp_subflow_rsk(req); + if (subflow_req->mp_capable) { + opt_rx.mptcp.mp_capable = 0; + mptcp_get_options(skb, &opt_rx); + if (!opt_rx.mptcp.mp_capable) + subflow_req->mp_capable = 0; + else + subflow_req->remote_key = opt_rx.mptcp.sndr_key; + } child = listener->icsk_af_ops->syn_recv_sock(sk, skb, req, dst, req_unhash, own_req); From patchwork Wed Jan 22 00:56:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Paasch X-Patchwork-Id: 1226864 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=2001:19d0:306:5::1; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=quarantine dis=none) header.from=apple.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=apple.com header.i=@apple.com header.a=rsa-sha256 header.s=20180706 header.b=BNEmHvB6; dkim-atps=neutral Received: from ml01.01.org (ml01.01.org [IPv6:2001:19d0:306:5::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 482Rnl53Fkz9sRX for ; Wed, 22 Jan 2020 11:57:15 +1100 (AEDT) Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 81A811007B8DF; Tue, 21 Jan 2020 17:00:32 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=17.171.2.68; helo=ma1-aaemail-dr-lapp02.apple.com; envelope-from=cpaasch@apple.com; receiver= Received: from ma1-aaemail-dr-lapp02.apple.com (ma1-aaemail-dr-lapp02.apple.com [17.171.2.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id E7D991007B8E1 for ; Tue, 21 Jan 2020 17:00:29 -0800 (PST) Received: from pps.filterd (ma1-aaemail-dr-lapp02.apple.com [127.0.0.1]) by ma1-aaemail-dr-lapp02.apple.com (8.16.0.27/8.16.0.27) with SMTP id 00M0ux6t000395; Tue, 21 Jan 2020 16:57:10 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=apple.com; h=sender : from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=20180706; bh=Mo11qu68XXptC7F18Q0zSfwxwPMXjNEChqmbm5/5QPk=; b=BNEmHvB6zKYHZ5n4ZfbAtQm1++6yOKQSbOiu3M4cS/36SF6BEifWFdnPbxWxMNtFd5hD Ot7EbHU9Hh0t2iBaXy9S4k9r4n9l6Gih8PeDQdKAaFIbEs7W9VTuxrLYCtyWLxEC40Ul 5io1xWfUqfmoUAj4XWMcdhqSkK5H+s8JD1emULJTtoe6NEviVZqvseH/UboH36Ry2rW5 LBJ3ZieRT7ElDHSrGBxBSzJCg46a9cFj06qHtqHpi95BdZuQ0cKN+3Kv7gN9QCdC//TK G8fNXPYvU3TNO8gyRDiIEdiQ+/aPzGqoHVAA8bEpAi7sHZMpaa3Xzdo9xWlMDxRlwpZK Sg== Received: from ma1-mtap-s02.corp.apple.com (ma1-mtap-s02.corp.apple.com [17.40.76.6]) by ma1-aaemail-dr-lapp02.apple.com with ESMTP id 2xkyjxgr5d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Tue, 21 Jan 2020 16:57:10 -0800 Received: from nwk-mmpp-sz13.apple.com (nwk-mmpp-sz13.apple.com [17.128.115.216]) by ma1-mtap-s02.corp.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPS id <0Q4H0018HHB4F030@ma1-mtap-s02.corp.apple.com>; Tue, 21 Jan 2020 16:57:06 -0800 (PST) Received: from process_milters-daemon.nwk-mmpp-sz13.apple.com by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) id <0Q4H00100GR2PR00@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:57:05 -0800 (PST) X-Va-A: X-Va-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-Va-E-CD: 2bb8c2377df4632229015a09c88e84d8 X-Va-R-CD: d36872a0082b8a68a9205f4deb0f4013 X-Va-CD: 0 X-Va-ID: 30d7c300-e5fb-4f9a-baf8-df9bf2c6b0ea X-V-A: X-V-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-V-E-CD: 2bb8c2377df4632229015a09c88e84d8 X-V-R-CD: d36872a0082b8a68a9205f4deb0f4013 X-V-CD: 0 X-V-ID: bc994a17-54d6-44d6-a218-9423c783ae97 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2020-01-17_05:,, signatures=0 Received: from localhost ([17.192.155.241]) by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPSA id <0Q4H00DT0HB1DC30@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:57:01 -0800 (PST) Sender: cpaasch@apple.com From: Christoph Paasch To: netdev@vger.kernel.org Date: Tue, 21 Jan 2020 16:56:32 -0800 Message-id: <20200122005633.21229-19-cpaasch@apple.com> X-Mailer: git-send-email 2.23.0 In-reply-to: <20200122005633.21229-1-cpaasch@apple.com> References: <20200122005633.21229-1-cpaasch@apple.com> MIME-version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2020-01-17_05:, , signatures=0 Message-ID-Hash: J4M6H2724ETINPRFJ5WBUKBH2FDVZQW4 X-Message-ID-Hash: J4M6H2724ETINPRFJ5WBUKBH2FDVZQW4 X-MailFrom: cpaasch@apple.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: mptcp@lists.01.org X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH net-next v3 18/19] mptcp: process MP_CAPABLE data option List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: This patch implements the handling of MP_CAPABLE + data option, as per RFC 6824 bis / RFC 8684: MPTCP v1. On the server side we can receive the remote key after that the connection is established. We need to explicitly track the 'missing remote key' status and avoid emitting a mptcp ack until we get such info. When a late/retransmitted/OoO pkt carrying MP_CAPABLE[+data] option is received, we have to propagate the mptcp seq number info to the msk socket. To avoid ABBA locking issue, explicitly check for that in recvmsg(), where we own msk and subflow sock locks. The above also means that an established mp_capable subflow - still waiting for the remote key - can be 'downgraded' to plain TCP. Such change could potentially block a reader waiting for new data forever - as they hook to msk, while later wake-up after the downgrade will be on subflow only. The above issue is not handled here, we likely have to get rid of msk->fallback to handle that cleanly. Signed-off-by: Christoph Paasch --- net/mptcp/options.c | 56 ++++++++++++++++++++++++++++++++++---------- net/mptcp/protocol.c | 16 +++++++------ net/mptcp/protocol.h | 10 +++++--- net/mptcp/subflow.c | 40 +++++++++++++++++++++++++++---- 4 files changed, 95 insertions(+), 27 deletions(-) diff --git a/net/mptcp/options.c b/net/mptcp/options.c index 8f82ff9a5a8e..45acd877bef3 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -243,6 +243,7 @@ void mptcp_rcv_synsent(struct sock *sk) pr_debug("subflow=%p", subflow); if (subflow->request_mptcp && tp->rx_opt.mptcp.mp_capable) { subflow->mp_capable = 1; + subflow->can_ack = 1; subflow->remote_key = tp->rx_opt.mptcp.sndr_key; } else { tcp_sk(sk)->is_mptcp = 0; @@ -332,6 +333,7 @@ static bool mptcp_established_options_dss(struct sock *sk, struct sk_buff *skb, struct mptcp_ext *mpext; struct mptcp_sock *msk; unsigned int ack_size; + bool ret = false; u8 tcp_fin; if (skb) { @@ -355,6 +357,14 @@ static bool mptcp_established_options_dss(struct sock *sk, struct sk_buff *skb, if (skb && tcp_fin && subflow->conn->sk_state != TCP_ESTABLISHED) mptcp_write_data_fin(subflow, &opts->ext_copy); + ret = true; + } + + opts->ext_copy.use_ack = 0; + msk = mptcp_sk(subflow->conn); + if (!msk || !READ_ONCE(msk->can_ack)) { + *size = ALIGN(dss_size, 4); + return ret; } ack_size = TCPOLEN_MPTCP_DSS_ACK64; @@ -365,15 +375,7 @@ static bool mptcp_established_options_dss(struct sock *sk, struct sk_buff *skb, dss_size += ack_size; - msk = mptcp_sk(mptcp_subflow_ctx(sk)->conn); - if (msk) { - opts->ext_copy.data_ack = msk->ack_seq; - } else { - mptcp_crypto_key_sha(mptcp_subflow_ctx(sk)->remote_key, - NULL, &opts->ext_copy.data_ack); - opts->ext_copy.data_ack++; - } - + opts->ext_copy.data_ack = msk->ack_seq; opts->ext_copy.ack64 = 1; opts->ext_copy.use_ack = 1; @@ -422,13 +424,46 @@ bool mptcp_synack_options(const struct request_sock *req, unsigned int *size, return false; } +static bool check_fourth_ack(struct mptcp_subflow_context *subflow, + struct sk_buff *skb, + struct mptcp_options_received *mp_opt) +{ + /* here we can process OoO, in-window pkts, only in-sequence 4th ack + * are relevant + */ + if (likely(subflow->fourth_ack || + TCP_SKB_CB(skb)->seq != subflow->ssn_offset + 1)) + return true; + + if (mp_opt->use_ack) + subflow->fourth_ack = 1; + + if (subflow->can_ack) + return true; + + /* If the first established packet does not contain MP_CAPABLE + data + * then fallback to TCP + */ + if (!mp_opt->mp_capable) { + subflow->mp_capable = 0; + tcp_sk(mptcp_subflow_tcp_sock(subflow))->is_mptcp = 0; + return false; + } + subflow->remote_key = mp_opt->sndr_key; + subflow->can_ack = 1; + return true; +} + void mptcp_incoming_options(struct sock *sk, struct sk_buff *skb, struct tcp_options_received *opt_rx) { + struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk); struct mptcp_options_received *mp_opt; struct mptcp_ext *mpext; mp_opt = &opt_rx->mptcp; + if (!check_fourth_ack(subflow, skb, mp_opt)) + return; if (!mp_opt->dss) return; @@ -441,9 +476,6 @@ void mptcp_incoming_options(struct sock *sk, struct sk_buff *skb, if (mp_opt->use_map) { if (mp_opt->mpc_map) { - struct mptcp_subflow_context *subflow = - mptcp_subflow_ctx(sk); - /* this is an MP_CAPABLE carrying MPTCP data * we know this map the first chunk of data */ diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 45e482864a19..1b64dfaa5f63 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -30,7 +30,7 @@ */ static struct socket *__mptcp_nmpc_socket(const struct mptcp_sock *msk) { - if (!msk->subflow || mptcp_subflow_ctx(msk->subflow->sk)->fourth_ack) + if (!msk->subflow || READ_ONCE(msk->can_ack)) return NULL; return msk->subflow; @@ -651,17 +651,20 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err, __mptcp_init_sock(new_mptcp_sock); msk = mptcp_sk(new_mptcp_sock); - msk->remote_key = subflow->remote_key; msk->local_key = subflow->local_key; msk->token = subflow->token; msk->subflow = NULL; mptcp_token_update_accept(newsk, new_mptcp_sock); - mptcp_crypto_key_sha(msk->remote_key, NULL, &ack_seq); msk->write_seq = subflow->idsn + 1; - ack_seq++; - msk->ack_seq = ack_seq; + if (subflow->can_ack) { + msk->can_ack = true; + msk->remote_key = subflow->remote_key; + mptcp_crypto_key_sha(msk->remote_key, NULL, &ack_seq); + ack_seq++; + msk->ack_seq = ack_seq; + } newsk = new_mptcp_sock; mptcp_copy_inaddrs(newsk, ssk); list_add(&subflow->node, &msk->conn_list); @@ -678,8 +681,6 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err, * the receive path and process the pending ones */ lock_sock(ssk); - subflow->map_seq = ack_seq; - subflow->map_subflow_seq = 1; subflow->rel_write_seq = 1; subflow->tcp_sock = ssk; subflow->conn = new_mptcp_sock; @@ -795,6 +796,7 @@ void mptcp_finish_connect(struct sock *ssk) WRITE_ONCE(msk->token, subflow->token); WRITE_ONCE(msk->write_seq, subflow->idsn + 1); WRITE_ONCE(msk->ack_seq, ack_seq); + WRITE_ONCE(msk->can_ack, 1); } static void mptcp_sock_graft(struct sock *sk, struct socket *parent) diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 36b90024d34d..10eaa7c7381b 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -69,6 +69,7 @@ struct mptcp_sock { u64 ack_seq; u32 token; unsigned long flags; + bool can_ack; struct list_head conn_list; struct skb_ext *cached_ext; /* for the next sendmsg */ struct socket *subflow; /* outgoing connect/listener/!mp_capable */ @@ -84,9 +85,10 @@ static inline struct mptcp_sock *mptcp_sk(const struct sock *sk) struct mptcp_subflow_request_sock { struct tcp_request_sock sk; - u8 mp_capable : 1, + u16 mp_capable : 1, mp_join : 1, - backup : 1; + backup : 1, + remote_key_valid : 1; u64 local_key; u64 remote_key; u64 idsn; @@ -118,8 +120,10 @@ struct mptcp_subflow_context { fourth_ack : 1, /* send initial DSS */ conn_finished : 1, map_valid : 1, + mpc_map : 1, data_avail : 1, - rx_eof : 1; + rx_eof : 1, + can_ack : 1; /* only after processing the remote a key */ struct sock *tcp_sock; /* tcp sk backpointer */ struct sock *conn; /* parent mptcp_sock */ diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index 8892855f4f52..8cfa1d29d59c 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -61,6 +61,7 @@ static void subflow_init_req(struct request_sock *req, mptcp_get_options(skb, &rx_opt); subflow_req->mp_capable = 0; + subflow_req->remote_key_valid = 0; #ifdef CONFIG_TCP_MD5SIG /* no MPTCP if MD5SIG is enabled on this socket or we may run out of @@ -185,17 +186,28 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk, pr_debug("listener=%p, req=%p, conn=%p", listener, req, listener->conn); - /* if the sk is MP_CAPABLE, we need to fetch the client key */ + /* if the sk is MP_CAPABLE, we try to fetch the client key */ subflow_req = mptcp_subflow_rsk(req); if (subflow_req->mp_capable) { + if (TCP_SKB_CB(skb)->seq != subflow_req->ssn_offset + 1) { + /* here we can receive and accept an in-window, + * out-of-order pkt, which will not carry the MP_CAPABLE + * opt even on mptcp enabled paths + */ + goto create_child; + } + opt_rx.mptcp.mp_capable = 0; mptcp_get_options(skb, &opt_rx); - if (!opt_rx.mptcp.mp_capable) - subflow_req->mp_capable = 0; - else + if (opt_rx.mptcp.mp_capable) { subflow_req->remote_key = opt_rx.mptcp.sndr_key; + subflow_req->remote_key_valid = 1; + } else { + subflow_req->mp_capable = 0; + } } +create_child: child = listener->icsk_af_ops->syn_recv_sock(sk, skb, req, dst, req_unhash, own_req); @@ -377,6 +389,7 @@ static enum mapping_status get_mapping_status(struct sock *ssk) subflow->map_subflow_seq = mpext->subflow_seq; subflow->map_data_len = data_len; subflow->map_valid = 1; + subflow->mpc_map = mpext->mpc_map; pr_debug("new map seq=%llu subflow_seq=%u data_len=%u", subflow->map_seq, subflow->map_subflow_seq, subflow->map_data_len); @@ -428,6 +441,19 @@ static bool subflow_check_data_avail(struct sock *ssk) if (WARN_ON_ONCE(!skb)) return false; + /* if msk lacks the remote key, this subflow must provide an + * MP_CAPABLE-based mapping + */ + if (unlikely(!READ_ONCE(msk->can_ack))) { + if (!subflow->mpc_map) { + ssk->sk_err = EBADMSG; + goto fatal; + } + WRITE_ONCE(msk->remote_key, subflow->remote_key); + WRITE_ONCE(msk->ack_seq, subflow->map_seq); + WRITE_ONCE(msk->can_ack, true); + } + old_ack = READ_ONCE(msk->ack_seq); ack_seq = mptcp_subflow_get_mapped_dsn(subflow); pr_debug("msk ack_seq=%llx subflow ack_seq=%llx", old_ack, @@ -752,13 +778,17 @@ static void subflow_ulp_clone(const struct request_sock *req, return; } + /* see comments in subflow_syn_recv_sock(), MPTCP connection is fully + * established only after we receive the remote key + */ new_ctx->conn_finished = 1; new_ctx->icsk_af_ops = old_ctx->icsk_af_ops; new_ctx->tcp_data_ready = old_ctx->tcp_data_ready; new_ctx->tcp_state_change = old_ctx->tcp_state_change; new_ctx->tcp_write_space = old_ctx->tcp_write_space; new_ctx->mp_capable = 1; - new_ctx->fourth_ack = 1; + new_ctx->fourth_ack = subflow_req->remote_key_valid; + new_ctx->can_ack = subflow_req->remote_key_valid; new_ctx->remote_key = subflow_req->remote_key; new_ctx->local_key = subflow_req->local_key; new_ctx->token = subflow_req->token; From patchwork Wed Jan 22 00:56:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Paasch X-Patchwork-Id: 1226862 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=2001:19d0:306:5::1; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=quarantine dis=none) header.from=apple.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=apple.com header.i=@apple.com header.a=rsa-sha256 header.s=20180706 header.b=uWnsbkQJ; dkim-atps=neutral Received: from ml01.01.org (ml01.01.org [IPv6:2001:19d0:306:5::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 482Rnk5psLz9sRG for ; Wed, 22 Jan 2020 11:57:14 +1100 (AEDT) Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 785071007B8EA; Tue, 21 Jan 2020 17:00:31 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=17.171.2.72; helo=ma1-aaemail-dr-lapp03.apple.com; envelope-from=cpaasch@apple.com; receiver= Received: from ma1-aaemail-dr-lapp03.apple.com (ma1-aaemail-dr-lapp03.apple.com [17.171.2.72]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 27DEB1007B8E1 for ; Tue, 21 Jan 2020 17:00:29 -0800 (PST) Received: from pps.filterd (ma1-aaemail-dr-lapp03.apple.com [127.0.0.1]) by ma1-aaemail-dr-lapp03.apple.com (8.16.0.27/8.16.0.27) with SMTP id 00M0uwKL005654; Tue, 21 Jan 2020 16:57:09 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=apple.com; h=sender : from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=20180706; bh=FMHlFwyZHC+VJQsNTHiaDaN8n18YpYs+rMlbRatO2t0=; b=uWnsbkQJfSKHz/6obIOMOPrl7cDI7D+GqOf0fCOMshcpMe0Yz/fshvUkUZj1yWe2r3/w EPsIb+1mCkMA02ggRMWBLLiAbw37nNB9R9WJVy8DVjrNE8zO/vlf3PqrIuaRhiagn1UM aw87EDvv1Zxs16urp+FArAMBmkj8JYnzBmlGJmPavY5P8DKHesWtnu9LuKEHFMLoEK8J D5AoCN9CgdZyCenbDtt8jOH1OcMI0EbPAAx+cnwFDhLRqQvr9wasgGeKM75xYcFPU5jG ny/izamc8wFGJbTB2vHaDmoj8+evsQlcxOIDQ9yE5oNbSMI73lo3cnnTlAWCCqlGsRbi Xw== Received: from ma1-mtap-s02.corp.apple.com (ma1-mtap-s02.corp.apple.com [17.40.76.6]) by ma1-aaemail-dr-lapp03.apple.com with ESMTP id 2xm1w0q03h-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Tue, 21 Jan 2020 16:57:09 -0800 Received: from nwk-mmpp-sz13.apple.com (nwk-mmpp-sz13.apple.com [17.128.115.216]) by ma1-mtap-s02.corp.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPS id <0Q4H0018HHB4F030@ma1-mtap-s02.corp.apple.com>; Tue, 21 Jan 2020 16:57:05 -0800 (PST) Received: from process_milters-daemon.nwk-mmpp-sz13.apple.com by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) id <0Q4H00500GU2WL00@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:57:04 -0800 (PST) X-Va-A: X-Va-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-Va-E-CD: ebbfb81d12456f1536843673fe0e0127 X-Va-R-CD: 7d07059f566fcad1b63a8abe07cb6839 X-Va-CD: 0 X-Va-ID: 010af09e-cebe-4ee7-87fd-1aa9a976d568 X-V-A: X-V-T-CD: 4b1e0bf36502e052fc75ad21b706ed24 X-V-E-CD: ebbfb81d12456f1536843673fe0e0127 X-V-R-CD: 7d07059f566fcad1b63a8abe07cb6839 X-V-CD: 0 X-V-ID: 252a0f71-884d-4ef7-8c1b-f205c79bca70 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2020-01-17_05:,, signatures=0 Received: from localhost ([17.192.155.241]) by nwk-mmpp-sz13.apple.com (Oracle Communications Messaging Server 8.0.2.4.20190507 64bit (built May 7 2019)) with ESMTPSA id <0Q4H0012MHB14Y50@nwk-mmpp-sz13.apple.com>; Tue, 21 Jan 2020 16:57:01 -0800 (PST) Sender: cpaasch@apple.com From: Christoph Paasch To: netdev@vger.kernel.org Date: Tue, 21 Jan 2020 16:56:33 -0800 Message-id: <20200122005633.21229-20-cpaasch@apple.com> X-Mailer: git-send-email 2.23.0 In-reply-to: <20200122005633.21229-1-cpaasch@apple.com> References: <20200122005633.21229-1-cpaasch@apple.com> MIME-version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2020-01-17_05:, , signatures=0 Message-ID-Hash: YCTZMLCTCZYHL25HB2X3EVZRDFBCTSLF X-Message-ID-Hash: YCTZMLCTCZYHL25HB2X3EVZRDFBCTSLF X-MailFrom: cpaasch@apple.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: mptcp@lists.01.org X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH net-next v3 19/19] mptcp: cope with later TCP fallback List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: From: Paolo Abeni With MPTCP v1, passive connections can fallback to TCP after the subflow becomes established: syn + MP_CAPABLE -> <- syn, ack + MP_CAPABLE ack, seq = 3 -> // OoO packet is accepted because in-sequence // passive socket is created, is in ESTABLISHED // status and tentatively as MP_CAPABLE ack, seq = 2 -> // no MP_CAPABLE opt, subflow should fallback to TCP We can't use the 'subflow' socket fallback, as we don't have it available for passive connection. Instead, when the fallback is detected, replace the mptcp socket with the underlying TCP subflow. Beyond covering the above scenario, it makes a TCP fallback socket as efficient as plain TCP ones. Co-developed-by: Florian Westphal Signed-off-by: Florian Westphal Signed-off-by: Paolo Abeni Signed-off-by: Christoph Paasch --- net/mptcp/protocol.c | 119 ++++++++++++++++++++++++++++++++++++------- net/mptcp/protocol.h | 1 + 2 files changed, 103 insertions(+), 17 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 1b64dfaa5f63..ee916b3686fe 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -24,6 +24,58 @@ #define MPTCP_SAME_STATE TCP_MAX_STATES +static void __mptcp_close(struct sock *sk, long timeout); + +static const struct proto_ops * tcp_proto_ops(struct sock *sk) +{ +#if IS_ENABLED(CONFIG_IPV6) + if (sk->sk_family == AF_INET6) + return &inet6_stream_ops; +#endif + return &inet_stream_ops; +} + +/* MP_CAPABLE handshake failed, convert msk to plain tcp, replacing + * socket->sk and stream ops and destroying msk + * return the msk socket, as we can't access msk anymore after this function + * completes + * Called with msk lock held, releases such lock before returning + */ +static struct socket *__mptcp_fallback_to_tcp(struct mptcp_sock *msk, + struct sock *ssk) +{ + struct mptcp_subflow_context *subflow; + struct socket *sock; + struct sock *sk; + + sk = (struct sock *)msk; + sock = sk->sk_socket; + subflow = mptcp_subflow_ctx(ssk); + + /* detach the msk socket */ + list_del_init(&subflow->node); + sock_orphan(sk); + sock->sk = NULL; + + /* socket is now TCP */ + lock_sock(ssk); + sock_graft(ssk, sock); + if (subflow->conn) { + /* We can't release the ULP data on a live socket, + * restore the tcp callback + */ + mptcp_subflow_tcp_fallback(ssk, subflow); + sock_put(subflow->conn); + subflow->conn = NULL; + } + release_sock(ssk); + sock->ops = tcp_proto_ops(ssk); + + /* destroy the left-over msk sock */ + __mptcp_close(sk, 0); + return sock; +} + /* If msk has an initial subflow socket, and the MP_CAPABLE handshake has not * completed yet or has failed, return the subflow socket. * Otherwise return NULL. @@ -36,25 +88,37 @@ static struct socket *__mptcp_nmpc_socket(const struct mptcp_sock *msk) return msk->subflow; } -/* if msk has a single subflow, and the mp_capable handshake is failed, - * return it. +static bool __mptcp_needs_tcp_fallback(const struct mptcp_sock *msk) +{ + return msk->first && !sk_is_mptcp(msk->first); +} + +/* if the mp_capable handshake has failed, it fallbacks msk to plain TCP, + * releases the socket lock and returns a reference to the now TCP socket. * Otherwise returns NULL */ -static struct socket *__mptcp_tcp_fallback(const struct mptcp_sock *msk) +static struct socket *__mptcp_tcp_fallback(struct mptcp_sock *msk) { - struct socket *ssock = __mptcp_nmpc_socket(msk); - sock_owned_by_me((const struct sock *)msk); - if (!ssock || sk_is_mptcp(ssock->sk)) + if (likely(!__mptcp_needs_tcp_fallback(msk))) return NULL; - return ssock; + if (msk->subflow) { + /* the first subflow is an active connection, discart the + * paired socket + */ + msk->subflow->sk = NULL; + sock_release(msk->subflow); + msk->subflow = NULL; + } + + return __mptcp_fallback_to_tcp(msk, msk->first); } static bool __mptcp_can_create_subflow(const struct mptcp_sock *msk) { - return ((struct sock *)msk)->sk_state == TCP_CLOSE; + return !msk->first; } static struct socket *__mptcp_socket_create(struct mptcp_sock *msk, int state) @@ -75,6 +139,7 @@ static struct socket *__mptcp_socket_create(struct mptcp_sock *msk, int state) if (err) return ERR_PTR(err); + msk->first = ssock->sk; msk->subflow = ssock; subflow = mptcp_subflow_ctx(ssock->sk); list_add(&subflow->node, &msk->conn_list); @@ -154,6 +219,8 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk, ret = sk_stream_wait_memory(ssk, timeo); if (ret) return ret; + if (unlikely(__mptcp_needs_tcp_fallback(msk))) + return 0; } /* compute copy limit */ @@ -265,11 +332,11 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) lock_sock(sk); ssock = __mptcp_tcp_fallback(msk); - if (ssock) { + if (unlikely(ssock)) { +fallback: pr_debug("fallback passthrough"); ret = sock_sendmsg(ssock, msg); - release_sock(sk); - return ret; + return ret >= 0 ? ret + copied : (copied ? copied : ret); } timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT); @@ -288,6 +355,11 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) &size_goal); if (ret < 0) break; + if (ret == 0 && unlikely(__mptcp_needs_tcp_fallback(msk))) { + release_sock(ssk); + ssock = __mptcp_tcp_fallback(msk); + goto fallback; + } copied += ret; } @@ -368,11 +440,11 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, lock_sock(sk); ssock = __mptcp_tcp_fallback(msk); - if (ssock) { + if (unlikely(ssock)) { +fallback: pr_debug("fallback-read subflow=%p", mptcp_subflow_ctx(ssock->sk)); copied = sock_recvmsg(ssock, msg, flags); - release_sock(sk); return copied; } @@ -477,6 +549,8 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, pr_debug("block timeout %ld", timeo); wait_data = true; mptcp_wait_data(sk, &timeo); + if (unlikely(__mptcp_tcp_fallback(msk))) + goto fallback; } if (more_data_avail) { @@ -529,6 +603,8 @@ static int __mptcp_init_sock(struct sock *sk) INIT_LIST_HEAD(&msk->conn_list); __set_bit(MPTCP_SEND_SPACE, &msk->flags); + msk->first = NULL; + return 0; } @@ -563,7 +639,8 @@ static void mptcp_subflow_shutdown(struct sock *ssk, int how) release_sock(ssk); } -static void mptcp_close(struct sock *sk, long timeout) +/* Called with msk lock held, releases such lock before returning */ +static void __mptcp_close(struct sock *sk, long timeout) { struct mptcp_subflow_context *subflow, *tmp; struct mptcp_sock *msk = mptcp_sk(sk); @@ -571,8 +648,6 @@ static void mptcp_close(struct sock *sk, long timeout) mptcp_token_destroy(msk->token); inet_sk_state_store(sk, TCP_CLOSE); - lock_sock(sk); - list_for_each_entry_safe(subflow, tmp, &msk->conn_list, node) { struct sock *ssk = mptcp_subflow_tcp_sock(subflow); @@ -585,6 +660,12 @@ static void mptcp_close(struct sock *sk, long timeout) sk_common_release(sk); } +static void mptcp_close(struct sock *sk, long timeout) +{ + lock_sock(sk); + __mptcp_close(sk, timeout); +} + static void mptcp_copy_inaddrs(struct sock *msk, const struct sock *ssk) { #if IS_ENABLED(CONFIG_MPTCP_IPV6) @@ -654,6 +735,7 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err, msk->local_key = subflow->local_key; msk->token = subflow->token; msk->subflow = NULL; + msk->first = newsk; mptcp_token_update_accept(newsk, new_mptcp_sock); @@ -1007,8 +1089,8 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock, static __poll_t mptcp_poll(struct file *file, struct socket *sock, struct poll_table_struct *wait) { - const struct mptcp_sock *msk; struct sock *sk = sock->sk; + struct mptcp_sock *msk; struct socket *ssock; __poll_t mask = 0; @@ -1024,6 +1106,9 @@ static __poll_t mptcp_poll(struct file *file, struct socket *sock, release_sock(sk); sock_poll_wait(file, sock, wait); lock_sock(sk); + ssock = __mptcp_tcp_fallback(msk); + if (unlikely(ssock)) + return ssock->ops->poll(file, ssock, NULL); if (test_bit(MPTCP_DATA_READY, &msk->flags)) mask = EPOLLIN | EPOLLRDNORM; diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 10eaa7c7381b..8a99a2930284 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -73,6 +73,7 @@ struct mptcp_sock { struct list_head conn_list; struct skb_ext *cached_ext; /* for the next sendmsg */ struct socket *subflow; /* outgoing connect/listener/!mp_capable */ + struct sock *first; }; #define mptcp_for_each_subflow(__msk, __subflow) \