diff mbox series

[net-next] mptcp: fallback in case of simultaneous connect

Message ID 5be9e5da904e5930b3a376c4aae6e2c908a6ba3f.1591884399.git.dcaratti@redhat.com
State Accepted, archived
Delegated to: Matthieu Baerts
Headers show
Series [net-next] mptcp: fallback in case of simultaneous connect | expand

Commit Message

Davide Caratti June 11, 2020, 2:09 p.m. UTC
when a MPTCP client tries to connect to itself, tcp_finish_connect() is
never reached. Because of this, depending on the socket current state,
multiple faulty behaviours can be observed:

1) a WARN_ON() in subflow_data_ready() is hit
 WARNING: CPU: 2 PID: 882 at net/mptcp/subflow.c:911 subflow_data_ready+0x18b/0x230
 [...]
 CPU: 2 PID: 882 Comm: gh35 Not tainted 5.7.0+ #187
 [...]
 RIP: 0010:subflow_data_ready+0x18b/0x230
 [...]
 Call Trace:
  tcp_data_queue+0xd2f/0x4250
  tcp_rcv_state_process+0xb1c/0x49d3
  tcp_v4_do_rcv+0x2bc/0x790
  __release_sock+0x153/0x2d0
  release_sock+0x4f/0x170
  mptcp_shutdown+0x167/0x4e0
  __sys_shutdown+0xe6/0x180
  __x64_sys_shutdown+0x50/0x70
  do_syscall_64+0x9a/0x370
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

2) client is stuck forever in mptcp_sendmsg() because the socket is not
   TCP_ESTABLISHED

 crash> bt 4847
 PID: 4847   TASK: ffff88814b2fb100  CPU: 1   COMMAND: "gh35"
  #0 [ffff8881376ff680] __schedule at ffffffff97248da4
  #1 [ffff8881376ff778] schedule at ffffffff9724a34f
  #2 [ffff8881376ff7a0] schedule_timeout at ffffffff97252ba0
  #3 [ffff8881376ff8a8] wait_woken at ffffffff958ab4ba
  #4 [ffff8881376ff940] sk_stream_wait_connect at ffffffff96c2d859
  #5 [ffff8881376ffa28] mptcp_sendmsg at ffffffff97207fca
  #6 [ffff8881376ffbc0] sock_sendmsg at ffffffff96be1b5b
  #7 [ffff8881376ffbe8] sock_write_iter at ffffffff96be1daa
  #8 [ffff8881376ffce8] new_sync_write at ffffffff95e5cb52
  #9 [ffff8881376ffe50] vfs_write at ffffffff95e6547f
 #10 [ffff8881376ffe90] ksys_write at ffffffff95e65d26
 #11 [ffff8881376fff28] do_syscall_64 at ffffffff956088ba
 #12 [ffff8881376fff50] entry_SYSCALL_64_after_hwframe at ffffffff9740008c
     RIP: 00007f126f6956ed  RSP: 00007ffc2a320278  RFLAGS: 00000217
     RAX: ffffffffffffffda  RBX: 0000000020000044  RCX: 00007f126f6956ed
     RDX: 0000000000000004  RSI: 00000000004007b8  RDI: 0000000000000003
     RBP: 00007ffc2a3202a0   R8: 0000000000400720   R9: 0000000000400720
     R10: 0000000000400720  R11: 0000000000000217  R12: 00000000004004b0
     R13: 00007ffc2a320380  R14: 0000000000000000  R15: 0000000000000000
     ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b

3) tcpdump captures show that DSS is exchanged even when MP_CAPABLE handshake
   didn't complete.

 $ tcpdump -tnnr bad.pcap
 IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [S], seq 3208913911, win 65483, options [mss 65495,sackOK,TS val 3291706876 ecr 3291694721,nop,wscale 7,mptcp capable v1], length 0
 IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [S.], seq 3208913911, ack 3208913912, win 65483, options [mss 65495,sackOK,TS val 3291706876 ecr 3291706876,nop,wscale 7,mptcp capable v1], length 0
 IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [.], ack 1, win 512, options [nop,nop,TS val 3291706876 ecr 3291706876], length 0
 IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [F.], seq 1, ack 1, win 512, options [nop,nop,TS val 3291707876 ecr 3291706876,mptcp dss fin seq 0 subseq 0 len 1,nop,nop], length 0
 IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [.], ack 2, win 512, options [nop,nop,TS val 3291707876 ecr 3291707876], length 0

force a fallback to TCP in these cases, and adjust the main socket
state to avoid hanging in mptcp_sendmsg().

BugLink: https://github.com/multipath-tcp/mptcp_net-next/issues/35
Reported-by: Christoph Paasch <cpaasch@apple.com>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
---
 net/mptcp/protocol.h | 10 ++++++++++
 net/mptcp/subflow.c  | 10 ++++++++++
 2 files changed, 20 insertions(+)

Comments

Mat Martineau June 11, 2020, 10:57 p.m. UTC | #1
On Thu, 11 Jun 2020, Davide Caratti wrote:

> when a MPTCP client tries to connect to itself, tcp_finish_connect() is
> never reached. Because of this, depending on the socket current state,
> multiple faulty behaviours can be observed:
>
> 1) a WARN_ON() in subflow_data_ready() is hit
> WARNING: CPU: 2 PID: 882 at net/mptcp/subflow.c:911 subflow_data_ready+0x18b/0x230
> [...]
> CPU: 2 PID: 882 Comm: gh35 Not tainted 5.7.0+ #187
> [...]
> RIP: 0010:subflow_data_ready+0x18b/0x230
> [...]
> Call Trace:
>  tcp_data_queue+0xd2f/0x4250
>  tcp_rcv_state_process+0xb1c/0x49d3
>  tcp_v4_do_rcv+0x2bc/0x790
>  __release_sock+0x153/0x2d0
>  release_sock+0x4f/0x170
>  mptcp_shutdown+0x167/0x4e0
>  __sys_shutdown+0xe6/0x180
>  __x64_sys_shutdown+0x50/0x70
>  do_syscall_64+0x9a/0x370
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> 2) client is stuck forever in mptcp_sendmsg() because the socket is not
>   TCP_ESTABLISHED
>
> crash> bt 4847
> PID: 4847   TASK: ffff88814b2fb100  CPU: 1   COMMAND: "gh35"
>  #0 [ffff8881376ff680] __schedule at ffffffff97248da4
>  #1 [ffff8881376ff778] schedule at ffffffff9724a34f
>  #2 [ffff8881376ff7a0] schedule_timeout at ffffffff97252ba0
>  #3 [ffff8881376ff8a8] wait_woken at ffffffff958ab4ba
>  #4 [ffff8881376ff940] sk_stream_wait_connect at ffffffff96c2d859
>  #5 [ffff8881376ffa28] mptcp_sendmsg at ffffffff97207fca
>  #6 [ffff8881376ffbc0] sock_sendmsg at ffffffff96be1b5b
>  #7 [ffff8881376ffbe8] sock_write_iter at ffffffff96be1daa
>  #8 [ffff8881376ffce8] new_sync_write at ffffffff95e5cb52
>  #9 [ffff8881376ffe50] vfs_write at ffffffff95e6547f
> #10 [ffff8881376ffe90] ksys_write at ffffffff95e65d26
> #11 [ffff8881376fff28] do_syscall_64 at ffffffff956088ba
> #12 [ffff8881376fff50] entry_SYSCALL_64_after_hwframe at ffffffff9740008c
>     RIP: 00007f126f6956ed  RSP: 00007ffc2a320278  RFLAGS: 00000217
>     RAX: ffffffffffffffda  RBX: 0000000020000044  RCX: 00007f126f6956ed
>     RDX: 0000000000000004  RSI: 00000000004007b8  RDI: 0000000000000003
>     RBP: 00007ffc2a3202a0   R8: 0000000000400720   R9: 0000000000400720
>     R10: 0000000000400720  R11: 0000000000000217  R12: 00000000004004b0
>     R13: 00007ffc2a320380  R14: 0000000000000000  R15: 0000000000000000
>     ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
>
> 3) tcpdump captures show that DSS is exchanged even when MP_CAPABLE handshake
>   didn't complete.
>
> $ tcpdump -tnnr bad.pcap
> IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [S], seq 3208913911, win 65483, options [mss 65495,sackOK,TS val 3291706876 ecr 3291694721,nop,wscale 7,mptcp capable v1], length 0
> IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [S.], seq 3208913911, ack 3208913912, win 65483, options [mss 65495,sackOK,TS val 3291706876 ecr 3291706876,nop,wscale 7,mptcp capable v1], length 0
> IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [.], ack 1, win 512, options [nop,nop,TS val 3291706876 ecr 3291706876], length 0
> IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [F.], seq 1, ack 1, win 512, options [nop,nop,TS val 3291707876 ecr 3291706876,mptcp dss fin seq 0 subseq 0 len 1,nop,nop], length 0
> IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [.], ack 2, win 512, options [nop,nop,TS val 3291707876 ecr 3291707876], length 0
>
> force a fallback to TCP in these cases, and adjust the main socket
> state to avoid hanging in mptcp_sendmsg().
>
> BugLink: https://github.com/multipath-tcp/mptcp_net-next/issues/35
> Reported-by: Christoph Paasch <cpaasch@apple.com>
> Suggested-by: Paolo Abeni <pabeni@redhat.com>
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>
> ---
> net/mptcp/protocol.h | 10 ++++++++++
> net/mptcp/subflow.c  | 10 ++++++++++
> 2 files changed, 20 insertions(+)
>
> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> index dc3bfcbc5ba29..634216e03f777 100644
> --- a/net/mptcp/protocol.h
> +++ b/net/mptcp/protocol.h
> @@ -507,4 +507,14 @@ static inline void mptcp_do_fallback(struct sock *sk)
> #define pr_fallback(a) do { pr_debug("%s:fallback to TCP (msk=%p)",\
> 				      __FUNCTION__, a); } while (0)
>
> +static inline bool subflow_simultaneous_connect(struct sock *sk)
> +{
> +	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
> +	struct sock *parent = subflow->conn;
> +
> +	return sk->sk_state == TCP_ESTABLISHED &&
> +	       !mptcp_sk(parent)->pm.server_side &&
> +	       !subflow->conn_finished;
> +}
> +
> #endif /* __MPTCP_PROTOCOL_H */
> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> index 417a82c12dddd..5a1e87301124f 100644
> --- a/net/mptcp/subflow.c
> +++ b/net/mptcp/subflow.c
> @@ -1119,6 +1119,16 @@ static void subflow_state_change(struct sock *sk)
>
> 	__subflow_state_change(sk);
>
> +	if (subflow_simultaneous_connect(sk)) {
> +		mptcp_do_fallback(sk);
> +		pr_fallback(mptcp_sk(parent));
> +		subflow->conn_finished = 1;
> +		if (inet_sk_state_load(parent) == TCP_SYN_SENT) {
> +			inet_sk_state_store(parent, TCP_ESTABLISHED);
> +			parent->sk_state_change(parent);
> +		}
> +	}
> +
> 	/* as recvmsg() does not acquire the subflow socket for ssk selection
> 	 * a fin packet carrying a DSS can be unnoticed if we don't trigger
> 	 * the data available machinery here.
> -- 
> 2.26.2

Thanks for the v2 Davide, looks good to me.

--
Mat Martineau
Intel
Matthieu Baerts June 15, 2020, 3:06 p.m. UTC | #2
Hi Davide, Mat,

On 12/06/2020 00:57, Mat Martineau wrote:
> On Thu, 11 Jun 2020, Davide Caratti wrote:
> 
>> when a MPTCP client tries to connect to itself, tcp_finish_connect() is
>> never reached. Because of this, depending on the socket current state,
>> multiple faulty behaviours can be observed:

(...)

> Thanks for the v2 Davide, looks good to me.

Thank you for the patch and the review!

Just added after "net: mptcp: improve fallback to TCP":

- 295da4ec8264: mptcp: fallback in case of simultaneous connect

(I replaced "BugFixes:" with "Closes:" tag to have the reference in Github.)

Tests + export are in progress!

Cheers,
Matt
diff mbox series

Patch

diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index dc3bfcbc5ba29..634216e03f777 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -507,4 +507,14 @@  static inline void mptcp_do_fallback(struct sock *sk)
 #define pr_fallback(a) do { pr_debug("%s:fallback to TCP (msk=%p)",\
 				      __FUNCTION__, a); } while (0)
 
+static inline bool subflow_simultaneous_connect(struct sock *sk)
+{
+	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
+	struct sock *parent = subflow->conn;
+
+	return sk->sk_state == TCP_ESTABLISHED &&
+	       !mptcp_sk(parent)->pm.server_side &&
+	       !subflow->conn_finished;
+}
+
 #endif /* __MPTCP_PROTOCOL_H */
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 417a82c12dddd..5a1e87301124f 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1119,6 +1119,16 @@  static void subflow_state_change(struct sock *sk)
 
 	__subflow_state_change(sk);
 
+	if (subflow_simultaneous_connect(sk)) {
+		mptcp_do_fallback(sk);
+		pr_fallback(mptcp_sk(parent));
+		subflow->conn_finished = 1;
+		if (inet_sk_state_load(parent) == TCP_SYN_SENT) {
+			inet_sk_state_store(parent, TCP_ESTABLISHED);
+			parent->sk_state_change(parent);
+		}
+	}
+
 	/* as recvmsg() does not acquire the subflow socket for ssk selection
 	 * a fin packet carrying a DSS can be unnoticed if we don't trigger
 	 * the data available machinery here.