[net,v2,2/3] mptcp: Protect subflow socket options before connection completes
diff mbox series

Message ID 20200213203836.225812-3-mathew.j.martineau@linux.intel.com
State Deferred
Headers show
Series
  • MPTCP fallback fixes
Related show

Commit Message

Mat Martineau Feb. 13, 2020, 8:38 p.m. UTC
Userspace should not be able to directly manipulate subflow socket
options before a connection is established since it is not yet known if
it will be an MPTCP subflow or a TCP fallback subflow. TCP fallback
subflows can be more directly controlled by userspace because they are
regular TCP connections, while MPTCP subflow sockets need to be
configured for the specific needs of MPTCP. Use the same logic as
sendmsg/recvmsg to ensure that socket option calls are only passed
through to known TCP fallback subflows.

Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
---
 net/mptcp/protocol.c | 50 ++++++++++++++++++++++++--------------------
 1 file changed, 27 insertions(+), 23 deletions(-)

Comments

Paolo Abeni Feb. 14, 2020, 11:08 a.m. UTC | #1
On Thu, 2020-02-13 at 12:38 -0800, Mat Martineau wrote:
> Userspace should not be able to directly manipulate subflow socket
> options before a connection is established since it is not yet known if
> it will be an MPTCP subflow or a TCP fallback subflow. TCP fallback
> subflows can be more directly controlled by userspace because they are
> regular TCP connections, while MPTCP subflow sockets need to be
> configured for the specific needs of MPTCP. Use the same logic as
> sendmsg/recvmsg to ensure that socket option calls are only passed
> through to known TCP fallback subflows.
> 
> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
> ---
>  net/mptcp/protocol.c | 50 ++++++++++++++++++++++++--------------------
>  1 file changed, 27 insertions(+), 23 deletions(-)
> 
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index a8faf66c38d8..0169e7dfc2d1 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -756,60 +756,64 @@ static int mptcp_setsockopt(struct sock *sk, int level, int optname,
>  			    char __user *optval, unsigned int optlen)
>  {
>  	struct mptcp_sock *msk = mptcp_sk(sk);
> -	int ret = -EOPNOTSUPP;
>  	struct socket *ssock;
> -	struct sock *ssk;
>  
>  	pr_debug("msk=%p", msk);
>  
>  	/* @@ the meaning of setsockopt() when the socket is connected and
> -	 * there are multiple subflows is not defined.
> +	 * there are multiple subflows is not yet defined. It is up to the
> +	 * MPTCP-level socket to configure the subflows until the subflow
> +	 * is in TCP fallback, when TCP socket options are passed through
> +	 * to the one remaining subflow.
>  	 */
>  	lock_sock(sk);
> -	ssock = __mptcp_socket_create(msk, MPTCP_SAME_STATE);
> -	if (IS_ERR(ssock)) {
> +	ssock = __mptcp_tcp_fallback(msk);
> +	if (ssock) {
> +		struct sock *ssk = ssock->sk;
> +		int ret;
> +
> +		sock_hold(ssk);
>  		release_sock(sk);
> +		ret = tcp_setsockopt(ssk, level, optname, optval, optlen);
> +		sock_put(ssk);
>  		return ret;
>  	}
>  
> -	ssk = ssock->sk;
> -	sock_hold(ssk);
>  	release_sock(sk);
>  
> -	ret = tcp_setsockopt(ssk, level, optname, optval, optlen);
> -	sock_put(ssk);
> -
> -	return ret;
> +	return -EOPNOTSUPP;
>  }
>  
>  static int mptcp_getsockopt(struct sock *sk, int level, int optname,
>  			    char __user *optval, int __user *option)
>  {
>  	struct mptcp_sock *msk = mptcp_sk(sk);
> -	int ret = -EOPNOTSUPP;
>  	struct socket *ssock;
> -	struct sock *ssk;
>  
>  	pr_debug("msk=%p", msk);
>  
> -	/* @@ the meaning of getsockopt() when the socket is connected and
> -	 * there are multiple subflows is not defined.
> +	/* @@ the meaning of setsockopt() when the socket is connected and
> +	 * there are multiple subflows is not yet defined. It is up to the
> +	 * MPTCP-level socket to configure the subflows until the subflow
> +	 * is in TCP fallback, when socket options are passed through
> +	 * to the one remaining subflow.
>  	 */
>  	lock_sock(sk);
> -	ssock = __mptcp_socket_create(msk, MPTCP_SAME_STATE);
> -	if (IS_ERR(ssock)) {
> +	ssock = __mptcp_tcp_fallback(msk);
> +	if (ssock) {
> +		struct sock *ssk = ssock->sk;
> +		int ret;
> +
> +		sock_hold(ssk);
>  		release_sock(sk);

Since I can't wrap my head on it, I'm going to do ask dumb question:

do we need this refcont at all? (here, in poll and in {recv,send}msg) ?

AFAICS:
- subflows can be freed only by mptcp_close()
- mptcp_close() is called by vfs->release() when the last reference to
the file descriptor associated to the msk socket is dropped.
- every syscall wraps vfs ops with fdget()/fdput() pairs so the file
descriptor can't go away during the syscall execution.

Overall I think the race between mptcp_close() and sock_sendmsg() is
not possible, WDYT ?!?

mptcp_close should need to care of possible cuncurrent BH/receive path
processing.

Paolo

Note: the I this doubt since I was wondering why socket-releated inet_*
operations that do not acquire the sock lock still do not have any
issue racing with close().
Mat Martineau Feb. 14, 2020, 7:52 p.m. UTC | #2
On Fri, 14 Feb 2020, Paolo Abeni wrote:

> On Thu, 2020-02-13 at 12:38 -0800, Mat Martineau wrote:
>> Userspace should not be able to directly manipulate subflow socket
>> options before a connection is established since it is not yet known if
>> it will be an MPTCP subflow or a TCP fallback subflow. TCP fallback
>> subflows can be more directly controlled by userspace because they are
>> regular TCP connections, while MPTCP subflow sockets need to be
>> configured for the specific needs of MPTCP. Use the same logic as
>> sendmsg/recvmsg to ensure that socket option calls are only passed
>> through to known TCP fallback subflows.
>>
>> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
>> ---
>>  net/mptcp/protocol.c | 50 ++++++++++++++++++++++++--------------------
>>  1 file changed, 27 insertions(+), 23 deletions(-)
>>
>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>> index a8faf66c38d8..0169e7dfc2d1 100644
>> --- a/net/mptcp/protocol.c
>> +++ b/net/mptcp/protocol.c
>> @@ -756,60 +756,64 @@ static int mptcp_setsockopt(struct sock *sk, int level, int optname,
>>  			    char __user *optval, unsigned int optlen)
>>  {
>>  	struct mptcp_sock *msk = mptcp_sk(sk);
>> -	int ret = -EOPNOTSUPP;
>>  	struct socket *ssock;
>> -	struct sock *ssk;
>>
>>  	pr_debug("msk=%p", msk);
>>
>>  	/* @@ the meaning of setsockopt() when the socket is connected and
>> -	 * there are multiple subflows is not defined.
>> +	 * there are multiple subflows is not yet defined. It is up to the
>> +	 * MPTCP-level socket to configure the subflows until the subflow
>> +	 * is in TCP fallback, when TCP socket options are passed through
>> +	 * to the one remaining subflow.
>>  	 */
>>  	lock_sock(sk);
>> -	ssock = __mptcp_socket_create(msk, MPTCP_SAME_STATE);
>> -	if (IS_ERR(ssock)) {
>> +	ssock = __mptcp_tcp_fallback(msk);
>> +	if (ssock) {
>> +		struct sock *ssk = ssock->sk;
>> +		int ret;
>> +
>> +		sock_hold(ssk);
>>  		release_sock(sk);
>> +		ret = tcp_setsockopt(ssk, level, optname, optval, optlen);
>> +		sock_put(ssk);
>>  		return ret;
>>  	}
>>
>> -	ssk = ssock->sk;
>> -	sock_hold(ssk);
>>  	release_sock(sk);
>>
>> -	ret = tcp_setsockopt(ssk, level, optname, optval, optlen);
>> -	sock_put(ssk);
>> -
>> -	return ret;
>> +	return -EOPNOTSUPP;
>>  }
>>
>>  static int mptcp_getsockopt(struct sock *sk, int level, int optname,
>>  			    char __user *optval, int __user *option)
>>  {
>>  	struct mptcp_sock *msk = mptcp_sk(sk);
>> -	int ret = -EOPNOTSUPP;
>>  	struct socket *ssock;
>> -	struct sock *ssk;
>>
>>  	pr_debug("msk=%p", msk);
>>
>> -	/* @@ the meaning of getsockopt() when the socket is connected and
>> -	 * there are multiple subflows is not defined.
>> +	/* @@ the meaning of setsockopt() when the socket is connected and
>> +	 * there are multiple subflows is not yet defined. It is up to the
>> +	 * MPTCP-level socket to configure the subflows until the subflow
>> +	 * is in TCP fallback, when socket options are passed through
>> +	 * to the one remaining subflow.
>>  	 */
>>  	lock_sock(sk);
>> -	ssock = __mptcp_socket_create(msk, MPTCP_SAME_STATE);
>> -	if (IS_ERR(ssock)) {
>> +	ssock = __mptcp_tcp_fallback(msk);
>> +	if (ssock) {
>> +		struct sock *ssk = ssock->sk;
>> +		int ret;
>> +
>> +		sock_hold(ssk);
>>  		release_sock(sk);
>
> Since I can't wrap my head on it, I'm going to do ask dumb question:
>

I think it's a good question!

> do we need this refcont at all? (here, in poll and in {recv,send}msg) ?
>
> AFAICS:
> - subflows can be freed only by mptcp_close()

This is true now, but Florian's patch that removed __mptcp_close() is 
fairly recent. I think the branch I was working from when I got concerned 
about the subflow lifetimes didn't have the change yet, so I the idea in 
my head that there were other paths to the subflow-freeing code that 
relied only on the MPTCP socket lock.

> - mptcp_close() is called by vfs->release() when the last reference to
> the file descriptor associated to the msk socket is dropped.
> - every syscall wraps vfs ops with fdget()/fdput() pairs so the file
> descriptor can't go away during the syscall execution.

While I saw that the MPTCP socket lifetime was managed with fdget/fdput, I 
couldn't track down such protections for the subflow socket structures 
because (as I now see) we're depending on the MPTCP-level socket reference 
count for that.

>
> Overall I think the race between mptcp_close() and sock_sendmsg() is
> not possible, WDYT ?!?

For single subflow, it does seem safe to me now that it's clear when 
mptcp_close is called. I'll drop the fallback changes outside of the 
sockopt functions and send the one sockopt patch to netdev.

>
> mptcp_close should need to care of possible cuncurrent BH/receive path
> processing.
>

Seems like the struct sock reference counts should handle that?

>
> Note: the I this doubt since I was wondering why socket-releated inet_*
> operations that do not acquire the sock lock still do not have any
> issue racing with close().

I'm glad you looked in to it. I was wondering the same thing when I saw 
the lack of fdget/fdput on the "struct socket" for subflows but didn't see 
that those were only freed when the entire MPTCP socket was being torn 
down.

--
Mat Martineau
Intel

Patch
diff mbox series

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index a8faf66c38d8..0169e7dfc2d1 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -756,60 +756,64 @@  static int mptcp_setsockopt(struct sock *sk, int level, int optname,
 			    char __user *optval, unsigned int optlen)
 {
 	struct mptcp_sock *msk = mptcp_sk(sk);
-	int ret = -EOPNOTSUPP;
 	struct socket *ssock;
-	struct sock *ssk;
 
 	pr_debug("msk=%p", msk);
 
 	/* @@ the meaning of setsockopt() when the socket is connected and
-	 * there are multiple subflows is not defined.
+	 * there are multiple subflows is not yet defined. It is up to the
+	 * MPTCP-level socket to configure the subflows until the subflow
+	 * is in TCP fallback, when TCP socket options are passed through
+	 * to the one remaining subflow.
 	 */
 	lock_sock(sk);
-	ssock = __mptcp_socket_create(msk, MPTCP_SAME_STATE);
-	if (IS_ERR(ssock)) {
+	ssock = __mptcp_tcp_fallback(msk);
+	if (ssock) {
+		struct sock *ssk = ssock->sk;
+		int ret;
+
+		sock_hold(ssk);
 		release_sock(sk);
+		ret = tcp_setsockopt(ssk, level, optname, optval, optlen);
+		sock_put(ssk);
 		return ret;
 	}
 
-	ssk = ssock->sk;
-	sock_hold(ssk);
 	release_sock(sk);
 
-	ret = tcp_setsockopt(ssk, level, optname, optval, optlen);
-	sock_put(ssk);
-
-	return ret;
+	return -EOPNOTSUPP;
 }
 
 static int mptcp_getsockopt(struct sock *sk, int level, int optname,
 			    char __user *optval, int __user *option)
 {
 	struct mptcp_sock *msk = mptcp_sk(sk);
-	int ret = -EOPNOTSUPP;
 	struct socket *ssock;
-	struct sock *ssk;
 
 	pr_debug("msk=%p", msk);
 
-	/* @@ the meaning of getsockopt() when the socket is connected and
-	 * there are multiple subflows is not defined.
+	/* @@ the meaning of setsockopt() when the socket is connected and
+	 * there are multiple subflows is not yet defined. It is up to the
+	 * MPTCP-level socket to configure the subflows until the subflow
+	 * is in TCP fallback, when socket options are passed through
+	 * to the one remaining subflow.
 	 */
 	lock_sock(sk);
-	ssock = __mptcp_socket_create(msk, MPTCP_SAME_STATE);
-	if (IS_ERR(ssock)) {
+	ssock = __mptcp_tcp_fallback(msk);
+	if (ssock) {
+		struct sock *ssk = ssock->sk;
+		int ret;
+
+		sock_hold(ssk);
 		release_sock(sk);
+		ret = tcp_getsockopt(ssk, level, optname, optval, option);
+		sock_put(ssk);
 		return ret;
 	}
 
-	ssk = ssock->sk;
-	sock_hold(ssk);
 	release_sock(sk);
 
-	ret = tcp_getsockopt(ssk, level, optname, optval, option);
-	sock_put(ssk);
-
-	return ret;
+	return -EOPNOTSUPP;
 }
 
 static int mptcp_get_port(struct sock *sk, unsigned short snum)