diff mbox series

[net-next,v2] net: core: enable SO_BINDTODEVICE for non-root users

Message ID 20200331132009.1306283-1-vincent@bernat.ch
State Accepted
Delegated to: David Miller
Headers show
Series [net-next,v2] net: core: enable SO_BINDTODEVICE for non-root users | expand

Commit Message

Vincent Bernat March 31, 2020, 1:20 p.m. UTC
Currently, SO_BINDTODEVICE requires CAP_NET_RAW. This change allows a
non-root user to bind a socket to an interface if it is not already
bound. This is useful to allow an application to bind itself to a
specific VRF for outgoing or incoming connections. Currently, an
application wanting to manage connections through several VRF need to
be privileged.

Previously, IP_UNICAST_IF and IPV6_UNICAST_IF were added for
Wine (76e21053b5bf3 and c4062dfc425e9) specifically for use by
non-root processes. However, they are restricted to sendmsg() and not
usable with TCP. Allowing SO_BINDTODEVICE would allow TCP clients to
get the same privilege. As for TCP servers, outside the VRF use case,
SO_BINDTODEVICE would only further restrict connections a server could
accept.

When an application is restricted to a VRF (with `ip vrf exec`), the
socket is bound to an interface at creation and therefore, a
non-privileged call to SO_BINDTODEVICE to escape the VRF fails.

When an application bound a socket to SO_BINDTODEVICE and transmit it
to a non-privileged process through a Unix socket, a tentative to
change the bound device also fails.

Before:

    >>> import socket
    >>> s=socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    >>> s.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE, b"dummy0")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    PermissionError: [Errno 1] Operation not permitted

After:

    >>> import socket
    >>> s=socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    >>> s.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE, b"dummy0")
    >>> s.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE, b"dummy0")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    PermissionError: [Errno 1] Operation not permitted

Signed-off-by: Vincent Bernat <vincent@bernat.ch>
---
 net/core/sock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

David Ahern April 2, 2020, 5:31 p.m. UTC | #1
On 3/31/20 7:20 AM, Vincent Bernat wrote:
> Currently, SO_BINDTODEVICE requires CAP_NET_RAW. This change allows a
> non-root user to bind a socket to an interface if it is not already
> bound. This is useful to allow an application to bind itself to a
> specific VRF for outgoing or incoming connections. Currently, an
> application wanting to manage connections through several VRF need to
> be privileged.
> 
> Previously, IP_UNICAST_IF and IPV6_UNICAST_IF were added for
> Wine (76e21053b5bf3 and c4062dfc425e9) specifically for use by
> non-root processes. However, they are restricted to sendmsg() and not
> usable with TCP. Allowing SO_BINDTODEVICE would allow TCP clients to
> get the same privilege. As for TCP servers, outside the VRF use case,
> SO_BINDTODEVICE would only further restrict connections a server could
> accept.
> 
> When an application is restricted to a VRF (with `ip vrf exec`), the
> socket is bound to an interface at creation and therefore, a
> non-privileged call to SO_BINDTODEVICE to escape the VRF fails.
> 
> When an application bound a socket to SO_BINDTODEVICE and transmit it
> to a non-privileged process through a Unix socket, a tentative to
> change the bound device also fails.
> 
> Before:
> 
>     >>> import socket
>     >>> s=socket.socket(socket.AF_INET, socket.SOCK_STREAM)
>     >>> s.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE, b"dummy0")
>     Traceback (most recent call last):
>       File "<stdin>", line 1, in <module>
>     PermissionError: [Errno 1] Operation not permitted
> 
> After:
> 
>     >>> import socket
>     >>> s=socket.socket(socket.AF_INET, socket.SOCK_STREAM)
>     >>> s.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE, b"dummy0")
>     >>> s.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE, b"dummy0")
>     Traceback (most recent call last):
>       File "<stdin>", line 1, in <module>
>     PermissionError: [Errno 1] Operation not permitted
> 
> Signed-off-by: Vincent Bernat <vincent@bernat.ch>
> ---
>  net/core/sock.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/core/sock.c b/net/core/sock.c
> index da32d9b6d09f..ce1d8dce9b7a 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -574,7 +574,7 @@ static int sock_setbindtodevice_locked(struct sock *sk, int ifindex)
>  
>  	/* Sorry... */
>  	ret = -EPERM;
> -	if (!ns_capable(net->user_ns, CAP_NET_RAW))
> +	if (sk->sk_bound_dev_if && !ns_capable(net->user_ns, CAP_NET_RAW))
>  		goto out;
>  
>  	ret = -EINVAL;
> 


Reviewed-by: David Ahern <dsahern@gmail.com>
David Miller April 3, 2020, 12:47 a.m. UTC | #2
From: Vincent Bernat <vincent@bernat.ch>
Date: Tue, 31 Mar 2020 15:20:10 +0200

> Currently, SO_BINDTODEVICE requires CAP_NET_RAW. This change allows a
> non-root user to bind a socket to an interface if it is not already
> bound.
 ...

Ok I'm convinced now, thanks for your patience.

Applied, thanks.
Vincent Bernat Oct. 23, 2020, 10:02 a.m. UTC | #3
❦  2 avril 2020 17:47 -07, David Miller:

>> Currently, SO_BINDTODEVICE requires CAP_NET_RAW. This change allows a
>> non-root user to bind a socket to an interface if it is not already
>> bound.
>  ...
>
> Ok I'm convinced now, thanks for your patience.

I've got some user feedback about this patch. I didn't think the patch
would allow to circumvent routing policies on most common setups, but
VPN may setup a default route with a lower metric and an application may
(on purpose or by accident) use SO_BINDTODEVICE to circumvent the lower
metric route:

default via 10.81.0.1 dev tun0 proto static metric 50
default via 192.168.122.1 dev enp1s0 proto dhcp metric 100

I am wondering if we should revert the patch for 5.10 while we can,
waiting for a better solution (and breaking people relying on the new
behavior in 5.9).

Then, I can propose a patch with a sysctl to avoid breaking existing
setups.
David Ahern Oct. 23, 2020, 2:40 p.m. UTC | #4
On 10/23/20 4:02 AM, Vincent Bernat wrote:
>  ❦  2 avril 2020 17:47 -07, David Miller:
> 
>>> Currently, SO_BINDTODEVICE requires CAP_NET_RAW. This change allows a
>>> non-root user to bind a socket to an interface if it is not already
>>> bound.
>>  ...
>>
>> Ok I'm convinced now, thanks for your patience.
> 
> I've got some user feedback about this patch. I didn't think the patch
> would allow to circumvent routing policies on most common setups, but
> VPN may setup a default route with a lower metric and an application may
> (on purpose or by accident) use SO_BINDTODEVICE to circumvent the lower
> metric route:
> 
> default via 10.81.0.1 dev tun0 proto static metric 50
> default via 192.168.122.1 dev enp1s0 proto dhcp metric 100

I thought we discussed this at the time you submitted your patch. That
was a known issue then, so nothing has really changed. Again, this patch
just brings equivalence to TCP for capabilities in UDP and raw sockets.

> 
> I am wondering if we should revert the patch for 5.10 while we can,
> waiting for a better solution (and breaking people relying on the new
> behavior in 5.9).
> 
> Then, I can propose a patch with a sysctl to avoid breaking existing
> setups.
> 

I have not walked the details, but it seems like a security policy can
be installed to get the previous behavior.
Vincent Bernat Oct. 27, 2020, 7:17 a.m. UTC | #5
❦ 23 octobre 2020 08:40 -06, David Ahern:

>> I am wondering if we should revert the patch for 5.10 while we can,
>> waiting for a better solution (and breaking people relying on the new
>> behavior in 5.9).
>> 
>> Then, I can propose a patch with a sysctl to avoid breaking existing
>> setups.
>> 
>
> I have not walked the details, but it seems like a security policy can
> be installed to get the previous behavior.

libtorrent is using SO_BINDTODEVICE for some reason (code is quite old,
so not git history). Previously, the call was unsuccesful and the error
was logged and ignored. Now, it succeeds and circumvent the routing
policy. Using Netfiler does not help as libtorrent won't act on dropped
packets as the socket is already configured on the wrong interface.
kprobe is unable to modify a syscall and seccomp cannot be applied
globally. LSM are usually distro specific. What kind of security policy
do you have in mind?

Thanks.
David Ahern Oct. 28, 2020, 3:22 p.m. UTC | #6
On 10/27/20 1:17 AM, Vincent Bernat wrote:
>  ❦ 23 octobre 2020 08:40 -06, David Ahern:
> 
>>> I am wondering if we should revert the patch for 5.10 while we can,
>>> waiting for a better solution (and breaking people relying on the new
>>> behavior in 5.9).
>>>
>>> Then, I can propose a patch with a sysctl to avoid breaking existing
>>> setups.
>>>
>>
>> I have not walked the details, but it seems like a security policy can
>> be installed to get the previous behavior.
> 
> libtorrent is using SO_BINDTODEVICE for some reason (code is quite old,
> so not git history). Previously, the call was unsuccesful and the error
> was logged and ignored. Now, it succeeds and circumvent the routing
> policy. Using Netfiler does not help as libtorrent won't act on dropped
> packets as the socket is already configured on the wrong interface.
> kprobe is unable to modify a syscall and seccomp cannot be applied
> globally. LSM are usually distro specific. What kind of security policy
> do you have in mind?
> 

nothing specific; I was hand waving.

There are bpf hooks to set and unset socket options, but those seem
inconvenient here.

I guess a sysctl is the only practical solution. If you do that we
should have granularity - any device, l3mdev devices only, ...
diff mbox series

Patch

diff --git a/net/core/sock.c b/net/core/sock.c
index da32d9b6d09f..ce1d8dce9b7a 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -574,7 +574,7 @@  static int sock_setbindtodevice_locked(struct sock *sk, int ifindex)
 
 	/* Sorry... */
 	ret = -EPERM;
-	if (!ns_capable(net->user_ns, CAP_NET_RAW))
+	if (sk->sk_bound_dev_if && !ns_capable(net->user_ns, CAP_NET_RAW))
 		goto out;
 
 	ret = -EINVAL;