diff mbox series

[RFC,net] mptcp: fallback in case of simultaneous connect

Message ID 20200609173716.1105362-1-dcaratti@redhat.com
State Superseded, archived
Delegated to: Mat Martineau
Headers show
Series [RFC,net] mptcp: fallback in case of simultaneous connect | expand

Commit Message

Davide Caratti June 9, 2020, 5:37 p.m. UTC
when a MPTCP client tries to connect to itself, tcp_finish_connect() is
never reached. Because of this, depending on ther socket current state,
multiple faults can be observed:

1) hitting a WARN_ON() in subflow_data_ready()
 WARNING: CPU: 2 PID: 882 at net/mptcp/subflow.c:911 subflow_data_ready+0x18b/0x230
 [...]
 CPU: 2 PID: 882 Comm: gh35 Not tainted 5.7.0+ #187
 [...]
 RIP: 0010:subflow_data_ready+0x18b/0x230
 [...]
 Call Trace:
  tcp_data_queue+0xd2f/0x4250
  tcp_rcv_state_process+0xb1c/0x49d3
  tcp_v4_do_rcv+0x2bc/0x790
  __release_sock+0x153/0x2d0
  release_sock+0x4f/0x170
  mptcp_shutdown+0x167/0x4e0
  __sys_shutdown+0xe6/0x180
  __x64_sys_shutdown+0x50/0x70
  do_syscall_64+0x9a/0x370
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

2) being stucked forever in mptcp_sendmsg() because the socket is not
   TCP_ESTABLISHED

 crash> bt 4847
 PID: 4847   TASK: ffff88814b2fb100  CPU: 1   COMMAND: "gh35"
  #0 [ffff8881376ff680] __schedule at ffffffff97248da4
  #1 [ffff8881376ff778] schedule at ffffffff9724a34f
  #2 [ffff8881376ff7a0] schedule_timeout at ffffffff97252ba0
  #3 [ffff8881376ff8a8] wait_woken at ffffffff958ab4ba
  #4 [ffff8881376ff940] sk_stream_wait_connect at ffffffff96c2d859
  #5 [ffff8881376ffa28] mptcp_sendmsg at ffffffff97207fca
  #6 [ffff8881376ffbc0] sock_sendmsg at ffffffff96be1b5b
  #7 [ffff8881376ffbe8] sock_write_iter at ffffffff96be1daa
  #8 [ffff8881376ffce8] new_sync_write at ffffffff95e5cb52
  #9 [ffff8881376ffe50] vfs_write at ffffffff95e6547f
 #10 [ffff8881376ffe90] ksys_write at ffffffff95e65d26
 #11 [ffff8881376fff28] do_syscall_64 at ffffffff956088ba
 #12 [ffff8881376fff50] entry_SYSCALL_64_after_hwframe at ffffffff9740008c
     RIP: 00007f126f6956ed  RSP: 00007ffc2a320278  RFLAGS: 00000217
     RAX: ffffffffffffffda  RBX: 0000000020000044  RCX: 00007f126f6956ed
     RDX: 0000000000000004  RSI: 00000000004007b8  RDI: 0000000000000003
     RBP: 00007ffc2a3202a0   R8: 0000000000400720   R9: 0000000000400720
     R10: 0000000000400720  R11: 0000000000000217  R12: 00000000004004b0
     R13: 00007ffc2a320380  R14: 0000000000000000  R15: 0000000000000000
     ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b

3) tcpdump captures showing DSS exchange even when MP_CAPABLE handshake
   didn't complete.

 $ tcpdump -tnnr bad.pcap
 IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [S], seq 3208913911, win 65483, options [mss 65495,sackOK,TS val 3291706876 ecr 3291694721,nop,wscale 7,mptcp capable v1], length 0
 IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [S.], seq 3208913911, ack 3208913912, win 65483, options [mss 65495,sackOK,TS val 3291706876 ecr 3291706876,nop,wscale 7,mptcp capable v1], length 0
 IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [.], ack 1, win 512, options [nop,nop,TS val 3291706876 ecr 3291706876], length 0
 IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [F.], seq 1, ack 1, win 512, options [nop,nop,TS val 3291707876 ecr 3291706876,mptcp dss fin seq 0 subseq 0 len 1,nop,nop], length 0
 IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [.], ack 2, win 512, options [nop,nop,TS val 3291707876 ecr 3291707876], length 0

force a fallback to TCP in these cases, and adjust the main socket
state to avoid hanging in sendmsg().

BugLink: https://github.com/multipath-tcp/mptcp_net-next/issues/35
Reported-by: Christoph Paasch <cpaasch@apple.com>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
---
 net/mptcp/protocol.h | 10 ++++++++++
 net/mptcp/subflow.c  |  9 +++++++++
 2 files changed, 19 insertions(+)

Comments

Christoph Paasch June 9, 2020, 9:50 p.m. UTC | #1
On 06/09/20 - 19:37, Davide Caratti wrote:
> when a MPTCP client tries to connect to itself, tcp_finish_connect() is
> never reached. Because of this, depending on ther socket current state,
> multiple faults can be observed:
> 
> 1) hitting a WARN_ON() in subflow_data_ready()
>  WARNING: CPU: 2 PID: 882 at net/mptcp/subflow.c:911 subflow_data_ready+0x18b/0x230
>  [...]
>  CPU: 2 PID: 882 Comm: gh35 Not tainted 5.7.0+ #187
>  [...]
>  RIP: 0010:subflow_data_ready+0x18b/0x230
>  [...]
>  Call Trace:
>   tcp_data_queue+0xd2f/0x4250
>   tcp_rcv_state_process+0xb1c/0x49d3
>   tcp_v4_do_rcv+0x2bc/0x790
>   __release_sock+0x153/0x2d0
>   release_sock+0x4f/0x170
>   mptcp_shutdown+0x167/0x4e0
>   __sys_shutdown+0xe6/0x180
>   __x64_sys_shutdown+0x50/0x70
>   do_syscall_64+0x9a/0x370
>   entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> 2) being stucked forever in mptcp_sendmsg() because the socket is not
>    TCP_ESTABLISHED
> 
>  crash> bt 4847
>  PID: 4847   TASK: ffff88814b2fb100  CPU: 1   COMMAND: "gh35"
>   #0 [ffff8881376ff680] __schedule at ffffffff97248da4
>   #1 [ffff8881376ff778] schedule at ffffffff9724a34f
>   #2 [ffff8881376ff7a0] schedule_timeout at ffffffff97252ba0
>   #3 [ffff8881376ff8a8] wait_woken at ffffffff958ab4ba
>   #4 [ffff8881376ff940] sk_stream_wait_connect at ffffffff96c2d859
>   #5 [ffff8881376ffa28] mptcp_sendmsg at ffffffff97207fca
>   #6 [ffff8881376ffbc0] sock_sendmsg at ffffffff96be1b5b
>   #7 [ffff8881376ffbe8] sock_write_iter at ffffffff96be1daa
>   #8 [ffff8881376ffce8] new_sync_write at ffffffff95e5cb52
>   #9 [ffff8881376ffe50] vfs_write at ffffffff95e6547f
>  #10 [ffff8881376ffe90] ksys_write at ffffffff95e65d26
>  #11 [ffff8881376fff28] do_syscall_64 at ffffffff956088ba
>  #12 [ffff8881376fff50] entry_SYSCALL_64_after_hwframe at ffffffff9740008c
>      RIP: 00007f126f6956ed  RSP: 00007ffc2a320278  RFLAGS: 00000217
>      RAX: ffffffffffffffda  RBX: 0000000020000044  RCX: 00007f126f6956ed
>      RDX: 0000000000000004  RSI: 00000000004007b8  RDI: 0000000000000003
>      RBP: 00007ffc2a3202a0   R8: 0000000000400720   R9: 0000000000400720
>      R10: 0000000000400720  R11: 0000000000000217  R12: 00000000004004b0
>      R13: 00007ffc2a320380  R14: 0000000000000000  R15: 0000000000000000
>      ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
> 
> 3) tcpdump captures showing DSS exchange even when MP_CAPABLE handshake
>    didn't complete.
> 
>  $ tcpdump -tnnr bad.pcap
>  IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [S], seq 3208913911, win 65483, options [mss 65495,sackOK,TS val 3291706876 ecr 3291694721,nop,wscale 7,mptcp capable v1], length 0
>  IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [S.], seq 3208913911, ack 3208913912, win 65483, options [mss 65495,sackOK,TS val 3291706876 ecr 3291706876,nop,wscale 7,mptcp capable v1], length 0
>  IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [.], ack 1, win 512, options [nop,nop,TS val 3291706876 ecr 3291706876], length 0
>  IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [F.], seq 1, ack 1, win 512, options [nop,nop,TS val 3291707876 ecr 3291706876,mptcp dss fin seq 0 subseq 0 len 1,nop,nop], length 0
>  IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [.], ack 2, win 512, options [nop,nop,TS val 3291707876 ecr 3291707876], length 0
> 
> force a fallback to TCP in these cases, and adjust the main socket
> state to avoid hanging in sendmsg().
> 
> BugLink: https://github.com/multipath-tcp/mptcp_net-next/issues/35
> Reported-by: Christoph Paasch <cpaasch@apple.com>
> Suggested-by: Paolo Abeni <pabeni@redhat.com>
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>

syzkaller doesn't trigger anymore! Nice!


Christoph
Mat Martineau June 9, 2020, 10 p.m. UTC | #2
Hi Davide -

On Tue, 9 Jun 2020, Davide Caratti wrote:

> when a MPTCP client tries to connect to itself, tcp_finish_connect() is
> never reached. Because of this, depending on ther socket current state,
> multiple faults can be observed:
>
> 1) hitting a WARN_ON() in subflow_data_ready()
> WARNING: CPU: 2 PID: 882 at net/mptcp/subflow.c:911 subflow_data_ready+0x18b/0x230
> [...]
> CPU: 2 PID: 882 Comm: gh35 Not tainted 5.7.0+ #187
> [...]
> RIP: 0010:subflow_data_ready+0x18b/0x230
> [...]
> Call Trace:
>  tcp_data_queue+0xd2f/0x4250
>  tcp_rcv_state_process+0xb1c/0x49d3
>  tcp_v4_do_rcv+0x2bc/0x790
>  __release_sock+0x153/0x2d0
>  release_sock+0x4f/0x170
>  mptcp_shutdown+0x167/0x4e0
>  __sys_shutdown+0xe6/0x180
>  __x64_sys_shutdown+0x50/0x70
>  do_syscall_64+0x9a/0x370
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> 2) being stucked forever in mptcp_sendmsg() because the socket is not
>   TCP_ESTABLISHED
>
> crash> bt 4847
> PID: 4847   TASK: ffff88814b2fb100  CPU: 1   COMMAND: "gh35"
>  #0 [ffff8881376ff680] __schedule at ffffffff97248da4
>  #1 [ffff8881376ff778] schedule at ffffffff9724a34f
>  #2 [ffff8881376ff7a0] schedule_timeout at ffffffff97252ba0
>  #3 [ffff8881376ff8a8] wait_woken at ffffffff958ab4ba
>  #4 [ffff8881376ff940] sk_stream_wait_connect at ffffffff96c2d859
>  #5 [ffff8881376ffa28] mptcp_sendmsg at ffffffff97207fca
>  #6 [ffff8881376ffbc0] sock_sendmsg at ffffffff96be1b5b
>  #7 [ffff8881376ffbe8] sock_write_iter at ffffffff96be1daa
>  #8 [ffff8881376ffce8] new_sync_write at ffffffff95e5cb52
>  #9 [ffff8881376ffe50] vfs_write at ffffffff95e6547f
> #10 [ffff8881376ffe90] ksys_write at ffffffff95e65d26
> #11 [ffff8881376fff28] do_syscall_64 at ffffffff956088ba
> #12 [ffff8881376fff50] entry_SYSCALL_64_after_hwframe at ffffffff9740008c
>     RIP: 00007f126f6956ed  RSP: 00007ffc2a320278  RFLAGS: 00000217
>     RAX: ffffffffffffffda  RBX: 0000000020000044  RCX: 00007f126f6956ed
>     RDX: 0000000000000004  RSI: 00000000004007b8  RDI: 0000000000000003
>     RBP: 00007ffc2a3202a0   R8: 0000000000400720   R9: 0000000000400720
>     R10: 0000000000400720  R11: 0000000000000217  R12: 00000000004004b0
>     R13: 00007ffc2a320380  R14: 0000000000000000  R15: 0000000000000000
>     ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
>
> 3) tcpdump captures showing DSS exchange even when MP_CAPABLE handshake
>   didn't complete.
>
> $ tcpdump -tnnr bad.pcap
> IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [S], seq 3208913911, win 65483, options [mss 65495,sackOK,TS val 3291706876 ecr 3291694721,nop,wscale 7,mptcp capable v1], length 0
> IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [S.], seq 3208913911, ack 3208913912, win 65483, options [mss 65495,sackOK,TS val 3291706876 ecr 3291706876,nop,wscale 7,mptcp capable v1], length 0
> IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [.], ack 1, win 512, options [nop,nop,TS val 3291706876 ecr 3291706876], length 0
> IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [F.], seq 1, ack 1, win 512, options [nop,nop,TS val 3291707876 ecr 3291706876,mptcp dss fin seq 0 subseq 0 len 1,nop,nop], length 0
> IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [.], ack 2, win 512, options [nop,nop,TS val 3291707876 ecr 3291707876], length 0
>
> force a fallback to TCP in these cases, and adjust the main socket
> state to avoid hanging in sendmsg().
>
> BugLink: https://github.com/multipath-tcp/mptcp_net-next/issues/35
> Reported-by: Christoph Paasch <cpaasch@apple.com>
> Suggested-by: Paolo Abeni <pabeni@redhat.com>
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>
> ---
> net/mptcp/protocol.h | 10 ++++++++++
> net/mptcp/subflow.c  |  9 +++++++++
> 2 files changed, 19 insertions(+)
>
> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> index e4341b278464..cdd0eaeebf0c 100644
> --- a/net/mptcp/protocol.h
> +++ b/net/mptcp/protocol.h
> @@ -500,4 +500,14 @@ static inline void mptcp_do_fallback(struct sock *sk)
> #define pr_fallback(a) do { pr_debug("%s:fallback to TCP (msk=%p)",\
> 				      __FUNCTION__, a); } while (0)
>
> +static inline bool subflow_simultaneous_connect(struct sock *sk)
> +{
> +	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
> +	struct sock *parent = subflow->conn;
> +
> +	return sk->sk_state == TCP_ESTABLISHED &&
> +	       !mptcp_sk(parent)->pm.server_side &&
> +	       !subflow->conn_finished;
> +}
> +
> #endif /* __MPTCP_PROTOCOL_H */
> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> index 439170f798c8..eccc19e1adbe 100644
> --- a/net/mptcp/subflow.c
> +++ b/net/mptcp/subflow.c
> @@ -1128,6 +1128,15 @@ static void subflow_state_change(struct sock *sk)
>
> 	__subflow_state_change(sk);
>
> +	if (subflow_simultaneous_connect(sk)) {
> +		mptcp_do_fallback(sk);
> +		pr_fallback(mptcp_sk(parent));
> +		if (inet_sk_state_load(parent) == TCP_SYN_SENT) {
> +			inet_sk_state_store(parent, TCP_ESTABLISHED);
> +			parent->sk_state_change(parent);
> +		}

Does it also make sense to set subflow->conn_finished to match more 
closely what happens in subflow_finish_connect()? Looks like this would 
mainly affect the MPTCP_SUBFLOW_FLAG_CONNECTED that gets reported to 
userspace in diag.c

Other than that, looks good. Thanks!

> +	}
> +
> 	/* as recvmsg() does not acquire the subflow socket for ssk selection
> 	 * a fin packet carrying a DSS can be unnoticed if we don't trigger
> 	 * the data available machinery here.
> --

--
Mat Martineau
Intel
Davide Caratti June 10, 2020, 9:59 a.m. UTC | #3
On Tue, 2020-06-09 at 15:00 -0700, Mat Martineau wrote:
> Does it also make sense to set subflow->conn_finished to match more 
> closely what happens in subflow_finish_connect()? Looks like this would 
> mainly affect the MPTCP_SUBFLOW_FLAG_CONNECTED that gets reported to 
> userspace in diag.c

yes, it makes sense: I will respin a v2 with:

- this addition
- the typo fixed in the commit message
- the subject prefix targeting the correct branch: since it depends on the
fallback rework, it should target net-next, not net (and I would have
difficulties in detecting a proper 'Fixes:' tag),

thanks for looking at this!
Christoph Paasch June 10, 2020, 3:25 p.m. UTC | #4
Hello,

On 06/10/20 - 11:59, Davide Caratti wrote:
> On Tue, 2020-06-09 at 15:00 -0700, Mat Martineau wrote:
> > Does it also make sense to set subflow->conn_finished to match more 
> > closely what happens in subflow_finish_connect()? Looks like this would 
> > mainly affect the MPTCP_SUBFLOW_FLAG_CONNECTED that gets reported to 
> > userspace in diag.c
> 
> yes, it makes sense: I will respin a v2 with:
> 
> - this addition
> - the typo fixed in the commit message
> - the subject prefix targeting the correct branch: since it depends on the
> fallback rework, it should target net-next, not net (and I would have
> difficulties in detecting a proper 'Fixes:' tag),
> 
> thanks for looking at this!

syzkaller found another WARNING in subflow_data_ready():

https://github.com/multipath-tcp/mptcp_net-next/issues/39

Seems to be different than the on in issue/35.


Christoph
diff mbox series

Patch

diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index e4341b278464..cdd0eaeebf0c 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -500,4 +500,14 @@  static inline void mptcp_do_fallback(struct sock *sk)
 #define pr_fallback(a) do { pr_debug("%s:fallback to TCP (msk=%p)",\
 				      __FUNCTION__, a); } while (0)
 
+static inline bool subflow_simultaneous_connect(struct sock *sk)
+{
+	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
+	struct sock *parent = subflow->conn;
+
+	return sk->sk_state == TCP_ESTABLISHED &&
+	       !mptcp_sk(parent)->pm.server_side &&
+	       !subflow->conn_finished;
+}
+
 #endif /* __MPTCP_PROTOCOL_H */
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 439170f798c8..eccc19e1adbe 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1128,6 +1128,15 @@  static void subflow_state_change(struct sock *sk)
 
 	__subflow_state_change(sk);
 
+	if (subflow_simultaneous_connect(sk)) {
+		mptcp_do_fallback(sk);
+		pr_fallback(mptcp_sk(parent));
+		if (inet_sk_state_load(parent) == TCP_SYN_SENT) {
+			inet_sk_state_store(parent, TCP_ESTABLISHED);
+			parent->sk_state_change(parent);
+		}
+	}
+
 	/* as recvmsg() does not acquire the subflow socket for ssk selection
 	 * a fin packet carrying a DSS can be unnoticed if we don't trigger
 	 * the data available machinery here.