diff mbox series

[v2,mptcp-next,2/3] mptcp: only admit explicitly supported sockopt

Message ID 6bde45df6832e151e57ef75e493b6375cbac9384.1614965103.git.pabeni@redhat.com
State Superseded, archived
Delegated to: Mat Martineau
Headers show
Series mptcp: use whitlist for sockopt | expand

Commit Message

Paolo Abeni March 5, 2021, 5:31 p.m. UTC
Unrolling mcast state at msk dismantel time is bug prone, as
syzkaller reported:

======================================================
WARNING: possible circular locking dependency detected
5.11.0-syzkaller #0 Not tainted
------------------------------------------------------
syz-executor905/8822 is trying to acquire lock:
ffffffff8d678fe8 (rtnl_mutex){+.+.}-{3:3}, at: ipv6_sock_mc_close+0xd7/0x110 net/ipv6/mcast.c:323

but task is already holding lock:
ffff888024390120 (sk_lock-AF_INET6){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1600 [inline]
ffff888024390120 (sk_lock-AF_INET6){+.+.}-{0:0}, at: mptcp6_release+0x57/0x130 net/mptcp/protocol.c:3507

which lock already depends on the new lock.

Instead we can simply forbit any mcast-related setsockopt

Fixes: 717e79c867ca5 ("mptcp: Add setsockopt()/getsockopt() socket operations")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v1 -> v2:
 - switch from blacklist to whitelist

A note on optname order. For each sol level, I tried to follow the
order used by the corresponding relevant setsockopt(), with the
following additional contraint:
[1] opt which should currently work correctly with no changes come first
[2] than opt that need some impl effort, but should not cause bugs/splat
  ATM
[3] then opt that are really a no-op - and currently work as intended
[4] (commented out) unsupported opt come last, with related reasoning.

Note that [2] is likely too wide. This is intended as to start discussion.
Looks like fixing issue/170 will require a bit of time :(
---
 net/mptcp/sockopt.c | 215 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 215 insertions(+)

Comments

Mat Martineau March 5, 2021, 11:39 p.m. UTC | #1
On Fri, 5 Mar 2021, Paolo Abeni wrote:

> Unrolling mcast state at msk dismantel time is bug prone, as
> syzkaller reported:
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.11.0-syzkaller #0 Not tainted
> ------------------------------------------------------
> syz-executor905/8822 is trying to acquire lock:
> ffffffff8d678fe8 (rtnl_mutex){+.+.}-{3:3}, at: ipv6_sock_mc_close+0xd7/0x110 net/ipv6/mcast.c:323
>
> but task is already holding lock:
> ffff888024390120 (sk_lock-AF_INET6){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1600 [inline]
> ffff888024390120 (sk_lock-AF_INET6){+.+.}-{0:0}, at: mptcp6_release+0x57/0x130 net/mptcp/protocol.c:3507
>
> which lock already depends on the new lock.
>
> Instead we can simply forbit any mcast-related setsockopt
>
> Fixes: 717e79c867ca5 ("mptcp: Add setsockopt()/getsockopt() socket operations")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> v1 -> v2:
> - switch from blacklist to whitelist
>
> A note on optname order. For each sol level, I tried to follow the
> order used by the corresponding relevant setsockopt(), with the
> following additional contraint:
> [1] opt which should currently work correctly with no changes come first
> [2] than opt that need some impl effort, but should not cause bugs/splat
>  ATM
> [3] then opt that are really a no-op - and currently work as intended
> [4] (commented out) unsupported opt come last, with related reasoning.
>
> Note that [2] is likely too wide. This is intended as to start discussion.
> Looks like fixing issue/170 will require a bit of time :(

The overall pattern here looks good to me. I agree that we should 
do more investigation to fine-tune what's in category [2].

I haven't had a chance to review the list of supported sockopts in the 
multipath-tcp.org kernel but I will look at that in more detail next week.

Some more comments below:

> ---
> net/mptcp/sockopt.c | 215 ++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 215 insertions(+)
>
> diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
> index 479f756539693..fac8eaf48498e 100644
> --- a/net/mptcp/sockopt.c
> +++ b/net/mptcp/sockopt.c
> @@ -82,6 +82,218 @@ static int mptcp_setsockopt_v6(struct mptcp_sock *msk, int optname,
> 	return ret;
> }
>
> +static bool mptcp_supported_sockopt(int level, int optname)
> +{
> +	if (level == SOL_SOCKET) {
> +		switch (optname) {
> +		case SO_DEBUG:
> +		case SO_REUSEPORT:
> +		case SO_REUSEADDR:
> +
> +		/* the following 23 need a better implementation,
> +		 * but are quite common we want to preserve them
> +		 */
> +		case SO_BINDTODEVICE:
> +		case SO_SNDBUF:
> +		case SO_SNDBUFFORCE:
> +		case SO_RCVBUF:
> +		case SO_RCVBUFFORCE:
> +		case SO_KEEPALIVE:
> +		case SO_PRIORITY:
> +		case SO_LINGER:
> +		case SO_TIMESTAMP_OLD:
> +		case SO_TIMESTAMP_NEW:
> +		case SO_TIMESTAMPNS_OLD:
> +		case SO_TIMESTAMPNS_NEW:
> +		case SO_TIMESTAMPING_OLD:
> +		case SO_TIMESTAMPING_NEW:
> +		case SO_RCVLOWAT:
> +		case SO_RCVTIMEO_OLD:
> +		case SO_RCVTIMEO_NEW:
> +		case SO_MARK:
> +		case SO_INCOMING_CPU:
> +		case SO_BINDTOIFINDEX:
> +#ifdef CONFIG_NET_RX_BUSY_POLL

sock_setsockopt() will take care of returning -ENOPROTOOPT if this config 
is not enabled, would be better to not have the redundant #ifdef here. As 
far as I can tell that doesn't break anything because these are 
unconditionally defined in socket.h

> +		case SO_BUSY_POLL:
> +		case SO_PREFER_BUSY_POLL:
> +		case SO_BUSY_POLL_BUDGET:
> +#endif
> +
> +		/* next ten are no-op for plain TCP */
> +		case SO_NO_CHECK:
> +		case SO_DONTROUTE:
> +		case SO_BROADCAST:
> +		case SO_BSDCOMPAT:
> +		case SO_PASSCRED:
> +		case SO_PASSSEC:
> +		case SO_RXQ_OVFL:
> +		case SO_WIFI_STATUS:
> +		case SO_NOFCS:
> +		case SO_SELECT_ERR_QUEUE:
> +
> +		/* SO_OOBINLINE is not supported, let's avoid the related mess */
> +		/* SO_ATTACH_FILTER, SO_ATTACH_BPF, SO_ATTACH_REUSEPORT_CBPF,
> +		 * SO_DETACH_REUSEPORT_BPF, SO_DETACH_FILTER, SO_LOCK_FILTER,
> +		 * we must be careful with subflows
> +		 */
> +		/* SO_ATTACH_REUSEPORT_EBPF is not supported, at it checks
> +		 * explicitly the sk_protocol field
> +		 */
> +		/* SO_PEEK_OFF is unsupported, as it is for plain TCP */
> +		/* SO_MAX_PACING_RATE is unsupported, we must be careful with subflows */
> +		/* SO_CNX_ADVICE is currently unsupported, could possibly be relevant,
> +		 * but likely needs careful design
> +		 */
> +		/* SO_ZEROCOPY is currently unsupported, TODO in sndmsg */
> +		/* SO_TXTIME is currently unsupported */

The formatting here made the intent less clear - if you move the above 
block comments were after these lines:

> +			return true;
> +		}

...it would be a lot more clear that the mentioned options are attached to 
the "return false;", and the case labels are for the "return true".

(this applies to the other SOL levels too)


Thanks,

Mat

> +		return false;
> +	}
> +	if (level == SOL_IP) {
> +		switch (optname) {
> +		/* should work fine */
> +		case IP_FREEBIND:
> +		case IP_TRANSPARENT:
> +
> +		/* the following nine control cmsg */
> +		case IP_PKTINFO:
> +		case IP_RECVTTL:
> +		case IP_RECVTOS:
> +		case IP_RECVOPTS:
> +		case IP_RETOPTS:
> +		case IP_PASSSEC:
> +		case IP_RECVORIGDSTADDR:
> +		case IP_CHECKSUM:
> +		case IP_RECVFRAGSIZE:
> +
> +		/* common stuff that need some love */
> +		case IP_TOS:
> +		case IP_TTL:
> +		case IP_BIND_ADDRESS_NO_PORT:
> +		case IP_MTU_DISCOVER:
> +		case IP_RECVERR:
> +
> +		/* possibly less common may deserve some love */
> +		case IP_MINTTL:
> +
> +		/* the following is apparently a no-op for plain TCP */
> +		case IP_RECVERR_RFC4884:
> +
> +		/* IP_OPTIONS is not supported, needs subflow care */
> +		/* IP_HDRINCL, IP_NODEFRAG are not supported, RAW specific */
> +		/* IP_MULTICAST_TTL, IP_MULTICAST_LOOP, IP_UNICAST_IF,
> +		 * IP_ADD_MEMBERSHIP, IP_ADD_SOURCE_MEMBERSHIP, IP_DROP_MEMBERSHIP,
> +		 * IP_DROP_SOURCE_MEMBERSHIP, IP_BLOCK_SOURCE, IP_UNBLOCK_SOURCE,
> +		 * MCAST_JOIN_GROUP, MCAST_LEAVE_GROUP MCAST_JOIN_SOURCE_GROUP,
> +		 * MCAST_LEAVE_SOURCE_GROUP, MCAST_BLOCK_SOURCE, MCAST_UNBLOCK_SOURCE,
> +		 * MCAST_MSFILTER, IP_MULTICAST_ALL are not supported, better not deal with mcast stuff
> +		 */
> +		/* IP_IPSEC_POLICY, IP_XFRM_POLICY are nut supported, unrelated here */
> +			return true;
> +		}
> +		return false;
> +	}
> +	if (level == SOL_IPV6) {
> +		switch (optname) {
> +		case IPV6_V6ONLY:
> +
> +		/* the following 14 control cmsg */
> +		case IPV6_RECVPKTINFO:
> +		case IPV6_2292PKTINFO:
> +		case IPV6_RECVHOPLIMIT:
> +		case IPV6_2292HOPLIMIT:
> +		case IPV6_RECVRTHDR:
> +		case IPV6_2292RTHDR:
> +		case IPV6_RECVHOPOPTS:
> +		case IPV6_2292HOPOPTS:
> +		case IPV6_RECVDSTOPTS:
> +		case IPV6_2292DSTOPTS:
> +		case IPV6_RECVTCLASS:
> +		case IPV6_FLOWINFO:
> +		case IPV6_RECVPATHMTU:
> +		case IPV6_RECVORIGDSTADDR:
> +		case IPV6_RECVFRAGSIZE:
> +
> +		/* the following 14 need some love but are quite common */
> +		case IPV6_TCLASS:
> +		case IPV6_TRANSPARENT:
> +		case IPV6_FREEBIND:
> +		case IPV6_PKTINFO:
> +		case IPV6_2292PKTOPTIONS:
> +		case IPV6_UNICAST_HOPS:
> +		case IPV6_MTU_DISCOVER:
> +		case IPV6_MTU:
> +		case IPV6_RECVERR:
> +		case IPV6_FLOWINFO_SEND:
> +		case IPV6_FLOWLABEL_MGR:
> +		case IPV6_MINHOPCOUNT:
> +		case IPV6_DONTFRAG:
> +		case IPV6_AUTOFLOWLABEL:
> +
> +		/* the following one is a no-op for plain TCP */
> +		case IPV6_RECVERR_RFC4884:
> +
> +		/* IPV6_HOPOPTS, IPV6_RTHDRDSTOPTS, IPV6_RTHDR, IPV6_DSTOPTS are
> +		 * not supported
> +		 */
> +		/* IPV6_MULTICAST_HOPS, IPV6_MULTICAST_LOOP, IPV6_UNICAST_IF,
> +		 * IPV6_MULTICAST_IF, IPV6_ADDRFORM,
> +		 * IPV6_ADD_MEMBERSHIP, IPV6_DROP_MEMBERSHIP, IPV6_JOIN_ANYCAST,
> +		 * IPV6_LEAVE_ANYCAST, IPV6_MULTICAST_ALL, MCAST_JOIN_GROUP, MCAST_LEAVE_GROUP,
> +		 * MCAST_JOIN_SOURCE_GROUP, MCAST_LEAVE_SOURCE_GROUP,
> +		 * MCAST_BLOCK_SOURCE, MCAST_UNBLOCK_SOURCE, MCAST_MSFILTER
> +		 * are not supported better not deal with mcast
> +		 */
> +		/* IPV6_ROUTER_ALERT, IPV6_ROUTER_ALERT_ISOLATE are not supported, since are evil */
> +
> +		/* IPV6_IPSEC_POLICY, IPV6_XFRM_POLICY are not supported */
> +		/* IPV6_ADDR_PREFERENCES is not supported, we must be careful with subflows */
> +			return true;
> +		}
> +		return false;
> +	}
> +	if (level == SOL_TCP) {
> +		switch (optname) {
> +		/* the following 2 are no-op or should work just fine */
> +		case TCP_THIN_DUPACK:
> +		case TCP_DEFER_ACCEPT:
> +
> +		/* the following 18 need some love */
> +		case TCP_MAXSEG:
> +		case TCP_NODELAY:
> +		case TCP_THIN_LINEAR_TIMEOUTS:
> +		case TCP_CONGESTION:
> +		case TCP_ULP:
> +		case TCP_CORK:
> +		case TCP_KEEPIDLE:
> +		case TCP_KEEPINTVL:
> +		case TCP_KEEPCNT:
> +		case TCP_SYNCNT:
> +		case TCP_SAVE_SYN:
> +		case TCP_LINGER2:
> +		case TCP_WINDOW_CLAMP:
> +		case TCP_QUICKACK:
> +		case TCP_USER_TIMEOUT:
> +		case TCP_TIMESTAMP:
> +		case TCP_NOTSENT_LOWAT:
> +		case TCP_TX_DELAY:
> +
> +		/* TCP_MD5SIG, TCP_MD5SIG_EXT are not supported, MD5 is not compatible with MPTCP */
> +
> +		/* TCP_REPAIR, TCP_REPAIR_QUEUE, TCP_QUEUE_SEQ, TCP_REPAIR_OPTIONS,
> +		 * TCP_REPAIR_WINDOW are not supported, better avoid this mess
> +		 */
> +		/* TCP_FASTOPEN_KEY, TCP_FASTOPEN TCP_FASTOPEN_CONNECT, TCP_FASTOPEN_NO_COOKIE,
> +		 * are not supported fastopen is currently unsupported
> +		 */
> +		/* TCP_INQ is currently unsupported, needs some recvmsg work */
> +			return true;
> +		}
> +	}
> +	return false;
> +}
> +
> int mptcp_setsockopt(struct sock *sk, int level, int optname,
> 		     sockptr_t optval, unsigned int optlen)
> {
> @@ -90,6 +302,9 @@ int mptcp_setsockopt(struct sock *sk, int level, int optname,
>
> 	pr_debug("msk=%p", msk);
>
> +	if (!mptcp_supported_sockopt(level, optname))
> +		return -ENOPROTOOPT;
> +
> 	if (level == SOL_SOCKET)
> 		return mptcp_setsockopt_sol_socket(msk, optname, optval, optlen);
>
> -- 
> 2.26.2

--
Mat Martineau
Intel
Matthieu Baerts March 6, 2021, 7:30 a.m. UTC | #2
Hi Paolo, Mat,

On 06/03/2021 00:39, Mat Martineau wrote:
> On Fri, 5 Mar 2021, Paolo Abeni wrote:

Thank you for the nice work and the reviews!

(...)

 >> +        /* the following 23 need a better implementation,
 >> +         * but are quite common we want to preserve them
 >> +         */

Maybe we should avoid writing numbers (23) here. It is likely something 
we will forget to update and I don't think they are needed. Same below: 
ten, nine, 14, one, 2, 18.

>> +        /* SO_OOBINLINE is not supported, let's avoid the related 
>> mess */
>> +        /* SO_ATTACH_FILTER, SO_ATTACH_BPF, SO_ATTACH_REUSEPORT_CBPF,
>> +         * SO_DETACH_REUSEPORT_BPF, SO_DETACH_FILTER, SO_LOCK_FILTER,
>> +         * we must be careful with subflows
>> +         */
>> +        /* SO_ATTACH_REUSEPORT_EBPF is not supported, at it checks
>> +         * explicitly the sk_protocol field
>> +         */
>> +        /* SO_PEEK_OFF is unsupported, as it is for plain TCP */
>> +        /* SO_MAX_PACING_RATE is unsupported, we must be careful with 
>> subflows */
>> +        /* SO_CNX_ADVICE is currently unsupported, could possibly be 
>> relevant,
>> +         * but likely needs careful design
>> +         */
>> +        /* SO_ZEROCOPY is currently unsupported, TODO in sndmsg */
>> +        /* SO_TXTIME is currently unsupported */
> 
> The formatting here made the intent less clear - if you move the above 
> block comments were after these lines:
> 
>> +            return true;
>> +        }
> 
> ...it would be a lot more clear that the mentioned options are attached 
> to the "return false;", and the case labels are for the "return true".
> 
> (this applies to the other SOL levels too)

Should we eventually add notes somewhere -- socket.h, in.h, in6.h, tcp.h 
-- to ask other devs to eventually contact us when new sockopt are being 
added?
Not sure it is something common to do but it can be useful for us and 
these devs.

Cheers,
Matt
Paolo Abeni March 8, 2021, 9:07 a.m. UTC | #3
On Sat, 2021-03-06 at 08:30 +0100, Matthieu Baerts wrote:
> Hi Paolo, Mat,
> 
> On 06/03/2021 00:39, Mat Martineau wrote:
> > On Fri, 5 Mar 2021, Paolo Abeni wrote:
> 
> Thank you for the nice work and the reviews!
> 
> (...)
> 
>  >> +        /* the following 23 need a better implementation,
>  >> +         * but are quite common we want to preserve them
>  >> +         */
> 
> Maybe we should avoid writing numbers (23) here. It is likely something 
> we will forget to update and I don't think they are needed. Same below: 
> ten, nine, 14, one, 2, 18.
> 
> > > +        /* SO_OOBINLINE is not supported, let's avoid the related 
> > > mess */
> > > +        /* SO_ATTACH_FILTER, SO_ATTACH_BPF, SO_ATTACH_REUSEPORT_CBPF,
> > > +         * SO_DETACH_REUSEPORT_BPF, SO_DETACH_FILTER, SO_LOCK_FILTER,
> > > +         * we must be careful with subflows
> > > +         */
> > > +        /* SO_ATTACH_REUSEPORT_EBPF is not supported, at it checks
> > > +         * explicitly the sk_protocol field
> > > +         */
> > > +        /* SO_PEEK_OFF is unsupported, as it is for plain TCP */
> > > +        /* SO_MAX_PACING_RATE is unsupported, we must be careful with 
> > > subflows */
> > > +        /* SO_CNX_ADVICE is currently unsupported, could possibly be 
> > > relevant,
> > > +         * but likely needs careful design
> > > +         */
> > > +        /* SO_ZEROCOPY is currently unsupported, TODO in sndmsg */
> > > +        /* SO_TXTIME is currently unsupported */
> > 
> > The formatting here made the intent less clear - if you move the above 
> > block comments were after these lines:
> > 
> > > +            return true;
> > > +        }
> > 
> > ...it would be a lot more clear that the mentioned options are attached 
> > to the "return false;", and the case labels are for the "return true".
> > 
> > (this applies to the other SOL levels too)
> 
> Should we eventually add notes somewhere -- socket.h, in.h, in6.h, tcp.h 
> -- to ask other devs to eventually contact us when new sockopt are being 
> added?
> Not sure it is something common to do but it can be useful for us and 
> these devs.

Replying to this point here, as it will be the only comment I plan to
not include in the next iteration: I think this is on us for the time
being. I would avoid core networking changes.

Additionally, if a new socket option is added, the whitelist model will
fit gracefully, as the new socket option will be unimplemented for
MPTCP and user-space will get back a resonable error code.

Cheers,

Paolo
Matthieu Baerts March 8, 2021, 11:30 a.m. UTC | #4
Hi Paolo,

On 08/03/2021 10:07, Paolo Abeni wrote:
> On Sat, 2021-03-06 at 08:30 +0100, Matthieu Baerts wrote:
>> On 06/03/2021 00:39, Mat Martineau wrote:
>>>
>>> ...it would be a lot more clear that the mentioned options are attached
>>> to the "return false;", and the case labels are for the "return true".
>>>
>>> (this applies to the other SOL levels too)
>>
>> Should we eventually add notes somewhere -- socket.h, in.h, in6.h, tcp.h
>> -- to ask other devs to eventually contact us when new sockopt are being
>> added?
>> Not sure it is something common to do but it can be useful for us and
>> these devs.
> 
> Replying to this point here, as it will be the only comment I plan to
> not include in the next iteration: I think this is on us for the time
> being. I would avoid core networking changes.

I understand if we cannot do that.

> Additionally, if a new socket option is added, the whitelist model will
> fit gracefully, as the new socket option will be unimplemented for
> MPTCP and user-space will get back a resonable error code.

That's fine then. We might allow one option with a delay because I guess 
we are not going to monitor all changes with these sockopt but I guess 
that's fine. At least an error code will be provided.

Cheers,
Matt
Matthieu Baerts March 11, 2021, 8:25 a.m. UTC | #5
Hi Paolo,

On 05/03/2021 18:31, Paolo Abeni wrote:
> Unrolling mcast state at msk dismantel time is bug prone, as
> syzkaller reported:
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.11.0-syzkaller #0 Not tainted
> ------------------------------------------------------
> syz-executor905/8822 is trying to acquire lock:
> ffffffff8d678fe8 (rtnl_mutex){+.+.}-{3:3}, at: ipv6_sock_mc_close+0xd7/0x110 net/ipv6/mcast.c:323
> 
> but task is already holding lock:
> ffff888024390120 (sk_lock-AF_INET6){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1600 [inline]
> ffff888024390120 (sk_lock-AF_INET6){+.+.}-{0:0}, at: mptcp6_release+0x57/0x130 net/mptcp/protocol.c:3507
> 
> which lock already depends on the new lock.
> 
> Instead we can simply forbit any mcast-related setsockopt
> 
> Fixes: 717e79c867ca5 ("mptcp: Add setsockopt()/getsockopt() socket operations")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> v1 -> v2:
>   - switch from blacklist to whitelist
> 
> A note on optname order. For each sol level, I tried to follow the
> order used by the corresponding relevant setsockopt(), with the
> following additional contraint:
> [1] opt which should currently work correctly with no changes come first
> [2] than opt that need some impl effort, but should not cause bugs/splat
>    ATM
> [3] then opt that are really a no-op - and currently work as intended
> [4] (commented out) unsupported opt come last, with related reasoning.
> 
> Note that [2] is likely too wide. This is intended as to start discussion.
> Looks like fixing issue/170 will require a bit of time :(

Out of curiosity, how did you define this first list? I mean: is it 
normal that some sockopts are not present there: not in a "case" or in 
the comments?

For example, Geliang noticed that when launching eBPF selftests, 
SO_SNDTIMEO* was not allowed for MPTCP sockets. But there is nothing 
about them.

Maybe this patch is enough: https://paste.centos.org/view/854d7e07

It is under testing by Geliang. At least to have the test passing, not 
sure if SO_SNDTIMEO is properly supported yet :)

But should we check (and have a test executed by the CI) to make sure 
all sockopts are at least written somewhere in the new sockopt.c file?

Cheers,
Matt
Paolo Abeni March 11, 2021, 9:25 a.m. UTC | #6
On Thu, 2021-03-11 at 09:25 +0100, Matthieu Baerts wrote:
> On 05/03/2021 18:31, Paolo Abeni wrote:
> > A note on optname order. For each sol level, I tried to follow the
> > order used by the corresponding relevant setsockopt(), with the
> > following additional contraint:
> > [1] opt which should currently work correctly with no changes come first
> > [2] than opt that need some impl effort, but should not cause bugs/splat
> >    ATM
> > [3] then opt that are really a no-op - and currently work as intended
> > [4] (commented out) unsupported opt come last, with related reasoning.
> > 
> > Note that [2] is likely too wide. This is intended as to start discussion.
> > Looks like fixing issue/170 will require a bit of time :(
> 
> Out of curiosity, how did you define this first list? 

Manual review of:
- sock_setsockopt() in net/core/sock.c
- do_ip_setsockopt() in net/ipv4/ip_sockglue.c
- do_ipv6_setsockopt() in net/ipv6/ipv6_sockglue.c
- do_tcp_setsockopt() in net/ipv4/tcp.c

as such is error prone :(

> I mean: is it 
> normal that some sockopts are not present there: not in a "case" or in 
> the comments?

Well, it depends on how much do you consider and error being "normal":)

I intentionally omitted only nft and bpf stuff.

> For example, Geliang noticed that when launching eBPF selftests, 
> SO_SNDTIMEO* was not allowed for MPTCP sockets. But there is nothing 
> about them.
> 
> Maybe this patch is enough: https://paste.centos.org/view/854d7e07

Yep, that is needed for sure.

> It is under testing by Geliang. At least to have the test passing, not 
> sure if SO_SNDTIMEO is properly supported yet :)

Looks like 'SO_SNDTIMEO' still will not work as expected, because the
setting is applied only to the first subflow, while
mptcp_recvmsg()/mptcp_sendmsg() will look for the timeout value in the
msk socket.

@Florian: do you think the above could be addressed by the sockopt()
improvements?

> But should we check (and have a test executed by the CI) to make sure 
> all sockopts are at least written somewhere in the new sockopt.c file?

Uhmm... do you mean something alike a BUILD_BUG_ON()???

I *think* it would be better more coverage via packetdrill ;)

Cheers,

Paolo
Matthieu Baerts March 11, 2021, 9:51 a.m. UTC | #7
Hi Paolo,

Thank you for your quick reply!

On 11/03/2021 10:25, Paolo Abeni wrote:
> On Thu, 2021-03-11 at 09:25 +0100, Matthieu Baerts wrote:
>> On 05/03/2021 18:31, Paolo Abeni wrote:
>>> A note on optname order. For each sol level, I tried to follow the
>>> order used by the corresponding relevant setsockopt(), with the
>>> following additional contraint:
>>> [1] opt which should currently work correctly with no changes come first
>>> [2] than opt that need some impl effort, but should not cause bugs/splat
>>>     ATM
>>> [3] then opt that are really a no-op - and currently work as intended
>>> [4] (commented out) unsupported opt come last, with related reasoning.
>>>
>>> Note that [2] is likely too wide. This is intended as to start discussion.
>>> Looks like fixing issue/170 will require a bit of time :(
>>
>> Out of curiosity, how did you define this first list?
> 
> Manual review of:
> - sock_setsockopt() in net/core/sock.c
> - do_ip_setsockopt() in net/ipv4/ip_sockglue.c
> - do_ipv6_setsockopt() in net/ipv6/ipv6_sockglue.c
> - do_tcp_setsockopt() in net/ipv4/tcp.c
> 
> as such is error prone :(

OK, that's clearer!
Possibly some are only in getsockopt() or have been missed, that's 
alright, it was a very good start :)

>> I mean: is it
>> normal that some sockopts are not present there: not in a "case" or in
>> the comments?
> 
> Well, it depends on how much do you consider and error being "normal":)
> 
> I intentionally omitted only nft and bpf stuff.

It makes sense for the moment!

>> For example, Geliang noticed that when launching eBPF selftests,
>> SO_SNDTIMEO* was not allowed for MPTCP sockets. But there is nothing
>> about them.
>>
>> Maybe this patch is enough: https://paste.centos.org/view/854d7e07
> 
> Yep, that is needed for sure.

OK, I can send a squash-to patch! (even if we need to improve it later, 
it is likely not causing issues)

>> It is under testing by Geliang. At least to have the test passing, not
>> sure if SO_SNDTIMEO is properly supported yet :)
> 
> Looks like 'SO_SNDTIMEO' still will not work as expected, because the
> setting is applied only to the first subflow, while
> mptcp_recvmsg()/mptcp_sendmsg() will look for the timeout value in the
> msk socket.
> 
> @Florian: do you think the above could be addressed by the sockopt()
> improvements?
> 
>> But should we check (and have a test executed by the CI) to make sure
>> all sockopts are at least written somewhere in the new sockopt.c file?
> 
> Uhmm... do you mean something alike a BUILD_BUG_ON()???

No, more a validation in CI scripts. Something a bit more advanced than:

   for i in $(grep "^#define\s\+SO_[A-Z_]\+\s\+[0-9]\+" \
                   include/uapi/asm-generic/socket.h | \
                         awk '{print $2}'); do
       grep -wq ${i} net/mptcp/sockopt.c || echo "Not there: ${i}"
   done

WDYT?

> I *think* it would be better more coverage via packetdrill ;)

Indeed, it would be good to have many PacketDrill tests to validate 
these sockopt!

Cheers,
Matt
Florian Westphal March 11, 2021, 11:34 a.m. UTC | #8
On 3/11/21 10:25 AM, Paolo Abeni wrote:
> Looks like 'SO_SNDTIMEO' still will not work as expected, because the
> setting is applied only to the first subflow, while
> mptcp_recvmsg()/mptcp_sendmsg() will look for the timeout value in the
> msk socket.
> 
> @Florian: do you think the above could be addressed by the sockopt()
> improvements?

Sure.
diff mbox series

Patch

diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c
index 479f756539693..fac8eaf48498e 100644
--- a/net/mptcp/sockopt.c
+++ b/net/mptcp/sockopt.c
@@ -82,6 +82,218 @@  static int mptcp_setsockopt_v6(struct mptcp_sock *msk, int optname,
 	return ret;
 }
 
+static bool mptcp_supported_sockopt(int level, int optname)
+{
+	if (level == SOL_SOCKET) {
+		switch (optname) {
+		case SO_DEBUG:
+		case SO_REUSEPORT:
+		case SO_REUSEADDR:
+
+		/* the following 23 need a better implementation,
+		 * but are quite common we want to preserve them
+		 */
+		case SO_BINDTODEVICE:
+		case SO_SNDBUF:
+		case SO_SNDBUFFORCE:
+		case SO_RCVBUF:
+		case SO_RCVBUFFORCE:
+		case SO_KEEPALIVE:
+		case SO_PRIORITY:
+		case SO_LINGER:
+		case SO_TIMESTAMP_OLD:
+		case SO_TIMESTAMP_NEW:
+		case SO_TIMESTAMPNS_OLD:
+		case SO_TIMESTAMPNS_NEW:
+		case SO_TIMESTAMPING_OLD:
+		case SO_TIMESTAMPING_NEW:
+		case SO_RCVLOWAT:
+		case SO_RCVTIMEO_OLD:
+		case SO_RCVTIMEO_NEW:
+		case SO_MARK:
+		case SO_INCOMING_CPU:
+		case SO_BINDTOIFINDEX:
+#ifdef CONFIG_NET_RX_BUSY_POLL
+		case SO_BUSY_POLL:
+		case SO_PREFER_BUSY_POLL:
+		case SO_BUSY_POLL_BUDGET:
+#endif
+
+		/* next ten are no-op for plain TCP */
+		case SO_NO_CHECK:
+		case SO_DONTROUTE:
+		case SO_BROADCAST:
+		case SO_BSDCOMPAT:
+		case SO_PASSCRED:
+		case SO_PASSSEC:
+		case SO_RXQ_OVFL:
+		case SO_WIFI_STATUS:
+		case SO_NOFCS:
+		case SO_SELECT_ERR_QUEUE:
+
+		/* SO_OOBINLINE is not supported, let's avoid the related mess */
+		/* SO_ATTACH_FILTER, SO_ATTACH_BPF, SO_ATTACH_REUSEPORT_CBPF,
+		 * SO_DETACH_REUSEPORT_BPF, SO_DETACH_FILTER, SO_LOCK_FILTER,
+		 * we must be careful with subflows
+		 */
+		/* SO_ATTACH_REUSEPORT_EBPF is not supported, at it checks
+		 * explicitly the sk_protocol field
+		 */
+		/* SO_PEEK_OFF is unsupported, as it is for plain TCP */
+		/* SO_MAX_PACING_RATE is unsupported, we must be careful with subflows */
+		/* SO_CNX_ADVICE is currently unsupported, could possibly be relevant,
+		 * but likely needs careful design
+		 */
+		/* SO_ZEROCOPY is currently unsupported, TODO in sndmsg */
+		/* SO_TXTIME is currently unsupported */
+			return true;
+		}
+		return false;
+	}
+	if (level == SOL_IP) {
+		switch (optname) {
+		/* should work fine */
+		case IP_FREEBIND:
+		case IP_TRANSPARENT:
+
+		/* the following nine control cmsg */
+		case IP_PKTINFO:
+		case IP_RECVTTL:
+		case IP_RECVTOS:
+		case IP_RECVOPTS:
+		case IP_RETOPTS:
+		case IP_PASSSEC:
+		case IP_RECVORIGDSTADDR:
+		case IP_CHECKSUM:
+		case IP_RECVFRAGSIZE:
+
+		/* common stuff that need some love */
+		case IP_TOS:
+		case IP_TTL:
+		case IP_BIND_ADDRESS_NO_PORT:
+		case IP_MTU_DISCOVER:
+		case IP_RECVERR:
+
+		/* possibly less common may deserve some love */
+		case IP_MINTTL:
+
+		/* the following is apparently a no-op for plain TCP */
+		case IP_RECVERR_RFC4884:
+
+		/* IP_OPTIONS is not supported, needs subflow care */
+		/* IP_HDRINCL, IP_NODEFRAG are not supported, RAW specific */
+		/* IP_MULTICAST_TTL, IP_MULTICAST_LOOP, IP_UNICAST_IF,
+		 * IP_ADD_MEMBERSHIP, IP_ADD_SOURCE_MEMBERSHIP, IP_DROP_MEMBERSHIP,
+		 * IP_DROP_SOURCE_MEMBERSHIP, IP_BLOCK_SOURCE, IP_UNBLOCK_SOURCE,
+		 * MCAST_JOIN_GROUP, MCAST_LEAVE_GROUP MCAST_JOIN_SOURCE_GROUP,
+		 * MCAST_LEAVE_SOURCE_GROUP, MCAST_BLOCK_SOURCE, MCAST_UNBLOCK_SOURCE,
+		 * MCAST_MSFILTER, IP_MULTICAST_ALL are not supported, better not deal with mcast stuff
+		 */
+		/* IP_IPSEC_POLICY, IP_XFRM_POLICY are nut supported, unrelated here */
+			return true;
+		}
+		return false;
+	}
+	if (level == SOL_IPV6) {
+		switch (optname) {
+		case IPV6_V6ONLY:
+
+		/* the following 14 control cmsg */
+		case IPV6_RECVPKTINFO:
+		case IPV6_2292PKTINFO:
+		case IPV6_RECVHOPLIMIT:
+		case IPV6_2292HOPLIMIT:
+		case IPV6_RECVRTHDR:
+		case IPV6_2292RTHDR:
+		case IPV6_RECVHOPOPTS:
+		case IPV6_2292HOPOPTS:
+		case IPV6_RECVDSTOPTS:
+		case IPV6_2292DSTOPTS:
+		case IPV6_RECVTCLASS:
+		case IPV6_FLOWINFO:
+		case IPV6_RECVPATHMTU:
+		case IPV6_RECVORIGDSTADDR:
+		case IPV6_RECVFRAGSIZE:
+
+		/* the following 14 need some love but are quite common */
+		case IPV6_TCLASS:
+		case IPV6_TRANSPARENT:
+		case IPV6_FREEBIND:
+		case IPV6_PKTINFO:
+		case IPV6_2292PKTOPTIONS:
+		case IPV6_UNICAST_HOPS:
+		case IPV6_MTU_DISCOVER:
+		case IPV6_MTU:
+		case IPV6_RECVERR:
+		case IPV6_FLOWINFO_SEND:
+		case IPV6_FLOWLABEL_MGR:
+		case IPV6_MINHOPCOUNT:
+		case IPV6_DONTFRAG:
+		case IPV6_AUTOFLOWLABEL:
+
+		/* the following one is a no-op for plain TCP */
+		case IPV6_RECVERR_RFC4884:
+
+		/* IPV6_HOPOPTS, IPV6_RTHDRDSTOPTS, IPV6_RTHDR, IPV6_DSTOPTS are
+		 * not supported
+		 */
+		/* IPV6_MULTICAST_HOPS, IPV6_MULTICAST_LOOP, IPV6_UNICAST_IF,
+		 * IPV6_MULTICAST_IF, IPV6_ADDRFORM,
+		 * IPV6_ADD_MEMBERSHIP, IPV6_DROP_MEMBERSHIP, IPV6_JOIN_ANYCAST,
+		 * IPV6_LEAVE_ANYCAST, IPV6_MULTICAST_ALL, MCAST_JOIN_GROUP, MCAST_LEAVE_GROUP,
+		 * MCAST_JOIN_SOURCE_GROUP, MCAST_LEAVE_SOURCE_GROUP,
+		 * MCAST_BLOCK_SOURCE, MCAST_UNBLOCK_SOURCE, MCAST_MSFILTER
+		 * are not supported better not deal with mcast
+		 */
+		/* IPV6_ROUTER_ALERT, IPV6_ROUTER_ALERT_ISOLATE are not supported, since are evil */
+
+		/* IPV6_IPSEC_POLICY, IPV6_XFRM_POLICY are not supported */
+		/* IPV6_ADDR_PREFERENCES is not supported, we must be careful with subflows */
+			return true;
+		}
+		return false;
+	}
+	if (level == SOL_TCP) {
+		switch (optname) {
+		/* the following 2 are no-op or should work just fine */
+		case TCP_THIN_DUPACK:
+		case TCP_DEFER_ACCEPT:
+
+		/* the following 18 need some love */
+		case TCP_MAXSEG:
+		case TCP_NODELAY:
+		case TCP_THIN_LINEAR_TIMEOUTS:
+		case TCP_CONGESTION:
+		case TCP_ULP:
+		case TCP_CORK:
+		case TCP_KEEPIDLE:
+		case TCP_KEEPINTVL:
+		case TCP_KEEPCNT:
+		case TCP_SYNCNT:
+		case TCP_SAVE_SYN:
+		case TCP_LINGER2:
+		case TCP_WINDOW_CLAMP:
+		case TCP_QUICKACK:
+		case TCP_USER_TIMEOUT:
+		case TCP_TIMESTAMP:
+		case TCP_NOTSENT_LOWAT:
+		case TCP_TX_DELAY:
+
+		/* TCP_MD5SIG, TCP_MD5SIG_EXT are not supported, MD5 is not compatible with MPTCP */
+
+		/* TCP_REPAIR, TCP_REPAIR_QUEUE, TCP_QUEUE_SEQ, TCP_REPAIR_OPTIONS,
+		 * TCP_REPAIR_WINDOW are not supported, better avoid this mess
+		 */
+		/* TCP_FASTOPEN_KEY, TCP_FASTOPEN TCP_FASTOPEN_CONNECT, TCP_FASTOPEN_NO_COOKIE,
+		 * are not supported fastopen is currently unsupported
+		 */
+		/* TCP_INQ is currently unsupported, needs some recvmsg work */
+			return true;
+		}
+	}
+	return false;
+}
+
 int mptcp_setsockopt(struct sock *sk, int level, int optname,
 		     sockptr_t optval, unsigned int optlen)
 {
@@ -90,6 +302,9 @@  int mptcp_setsockopt(struct sock *sk, int level, int optname,
 
 	pr_debug("msk=%p", msk);
 
+	if (!mptcp_supported_sockopt(level, optname))
+		return -ENOPROTOOPT;
+
 	if (level == SOL_SOCKET)
 		return mptcp_setsockopt_sol_socket(msk, optname, optval, optlen);