diff mbox series

net-icmp: make icmp{,v6} (ping) sockets available to all by default

Message ID 20200508234223.118254-1-zenczykowski@gmail.com
State Changes Requested
Delegated to: David Miller
Headers show
Series net-icmp: make icmp{,v6} (ping) sockets available to all by default | expand

Commit Message

Maciej Żenczykowski May 8, 2020, 11:42 p.m. UTC
From: Maciej Żenczykowski <maze@google.com>

This makes 'ping' 'ping6' and icmp based traceroute no longer
require any suid or file capabilities.

These sockets have baked long enough that the restriction
to make them unusable by default is no longer necessary.

The concerns were around exploits.  However there are now
major distros that default to enabling this.

This is already the default on Fedora 31:
  [root@f31vm ~]# cat /proc/sys/net/ipv4/ping_group_range
  0       2147483647
  [root@f31vm ~]# cat /usr/lib/sysctl.d/50-default.conf | egrep -B6 ping_group_range
  # ping(8) without CAP_NET_ADMIN and CAP_NET_RAW
  # The upper limit is set to 2^31-1. Values greater than that get rejected by
  # the kernel because of this definition in linux/include/net/ping.h:
  #   #define GID_T_MAX (((gid_t)~0U) >> 1)
  # That's not so bad because values between 2^31 and 2^32-1 are reserved on
  # systemd-based systems anyway: https://systemd.io/UIDS-GIDS.html#summary
  -net.ipv4.ping_group_range = 0 2147483647

And in general is super useful for any network namespace container
based setup.  See for example: https://docs.docker.com/engine/security/rootless/

This is one less thing you need to configure when you creare a new network
namespace.

Before:
  vm:~# unshare -n
  vm:~# cat /proc/sys/net/ipv4/ping_group_range
  1       0

After:
  vm:~# unshare -n
  vm:~# cat /proc/sys/net/ipv4/ping_group_range
  0       2147483647

Signed-off-by: Maciej Żenczykowski <maze@google.com>
---
 net/ipv4/af_inet.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

Comments

Ido Schimmel May 9, 2020, 7:15 p.m. UTC | #1
On Fri, May 08, 2020 at 04:42:23PM -0700, Maciej Żenczykowski wrote:
> From: Maciej Żenczykowski <maze@google.com>
> 
> This makes 'ping' 'ping6' and icmp based traceroute no longer
> require any suid or file capabilities.
> 
> These sockets have baked long enough that the restriction
> to make them unusable by default is no longer necessary.
> 
> The concerns were around exploits.  However there are now
> major distros that default to enabling this.
> 
> This is already the default on Fedora 31:
>   [root@f31vm ~]# cat /proc/sys/net/ipv4/ping_group_range
>   0       2147483647
>   [root@f31vm ~]# cat /usr/lib/sysctl.d/50-default.conf | egrep -B6 ping_group_range
>   # ping(8) without CAP_NET_ADMIN and CAP_NET_RAW
>   # The upper limit is set to 2^31-1. Values greater than that get rejected by
>   # the kernel because of this definition in linux/include/net/ping.h:
>   #   #define GID_T_MAX (((gid_t)~0U) >> 1)
>   # That's not so bad because values between 2^31 and 2^32-1 are reserved on
>   # systemd-based systems anyway: https://systemd.io/UIDS-GIDS.html#summary
>   -net.ipv4.ping_group_range = 0 2147483647
> 
> And in general is super useful for any network namespace container
> based setup.  See for example: https://docs.docker.com/engine/security/rootless/
> 
> This is one less thing you need to configure when you creare a new network
> namespace.
> 
> Before:
>   vm:~# unshare -n
>   vm:~# cat /proc/sys/net/ipv4/ping_group_range
>   1       0
> 
> After:
>   vm:~# unshare -n
>   vm:~# cat /proc/sys/net/ipv4/ping_group_range
>   0       2147483647
> 
> Signed-off-by: Maciej Żenczykowski <maze@google.com>

This will unfortunately cause regressions with VRFs because they don't
work correctly with ping sockets. Simple example:

ip link add name vrf-red up type vrf table 10
ip link add name vrf-blue up type vrf table 20
ip rule add pref 32765 table local
ip rule del pref 0

ip link add name veth-red type veth peer name veth-blue
ip link set dev veth-red up master vrf-red
ip link set dev veth-blue up master vrf-red
ip address add 192.0.2.1/24 dev veth-red
ip address add 192.0.2.2/24 dev veth-blue

ip vrf exec vrf-red ping -I 192.0.2.1 192.0.2.2 -c 1 -w 1

Before the patch:

PING 192.0.2.2 (192.0.2.2) from 192.0.2.1 : 56(84) bytes of data.
64 bytes from 192.0.2.2: icmp_seq=1 ttl=64 time=0.053 ms

After the patch:

bind: Cannot assign requested address

This specific test case is fixed by following patch, but at least a
similar change is needed for IPv6. Other changes might also be required.
Sorry for not providing a complete solution, but I'm busy with other
tasks at the moment. :/

diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 535427292194..8463b0e9e811 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -297,6 +297,7 @@ static int ping_check_bind_addr(struct sock *sk, struct inet_sock *isk,
        struct net *net = sock_net(sk);
        if (sk->sk_family == AF_INET) {
                struct sockaddr_in *addr = (struct sockaddr_in *) uaddr;
+               u32 tb_id = RT_TABLE_LOCAL;
                int chk_addr_ret;
 
                if (addr_len < sizeof(*addr))
@@ -310,7 +311,15 @@ static int ping_check_bind_addr(struct sock *sk, struct inet_sock *isk,
                pr_debug("ping_check_bind_addr(sk=%p,addr=%pI4,port=%d)\n",
                         sk, &addr->sin_addr.s_addr, ntohs(addr->sin_port));
 
-               chk_addr_ret = inet_addr_type(net, addr->sin_addr.s_addr);
+               if (sk->sk_bound_dev_if) {
+                       tb_id = l3mdev_fib_table_by_index(net,
+                                                         sk->sk_bound_dev_if);
+                       if (!tb_id)
+                               tb_id = RT_TABLE_LOCAL;
+               }
+
+               chk_addr_ret = inet_addr_type_table(net, addr->sin_addr.s_addr,
+                                                   tb_id);
 
                if (addr->sin_addr.s_addr == htonl(INADDR_ANY))
                        chk_addr_ret = RTN_LOCAL;

> ---
>  net/ipv4/af_inet.c | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index cf58e29cf746..1a8cb6f3ee38 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1819,12 +1819,8 @@ static __net_init int inet_init_net(struct net *net)
>  	net->ipv4.ip_local_ports.range[1] =  60999;
>  
>  	seqlock_init(&net->ipv4.ping_group_range.lock);
> -	/*
> -	 * Sane defaults - nobody may create ping sockets.
> -	 * Boot scripts should set this to distro-specific group.
> -	 */
> -	net->ipv4.ping_group_range.range[0] = make_kgid(&init_user_ns, 1);
> -	net->ipv4.ping_group_range.range[1] = make_kgid(&init_user_ns, 0);
> +	net->ipv4.ping_group_range.range[0] = GLOBAL_ROOT_GID;
> +	net->ipv4.ping_group_range.range[1] = KGIDT_INIT(0x7FFFFFFF);
>  
>  	/* Default values for sysctl-controlled parameters.
>  	 * We set them here, in case sysctl is not compiled.
> -- 
> 2.26.2.645.ge9eca65c58-goog
>
Maciej Żenczykowski May 9, 2020, 7:17 p.m. UTC | #2
Argh.  I've never understood the faintest thing about VRF's.
Ido Schimmel May 9, 2020, 7:32 p.m. UTC | #3
On Sat, May 09, 2020 at 12:17:47PM -0700, Maciej Żenczykowski wrote:
> Argh.  I've never understood the faintest thing about VRF's.

:)

There are many resources that David created over the years. For example:

https://www.kernel.org/doc/Documentation/networking/vrf.txt
https://netdevconf.info/1.1/proceedings/slides/ahern-vrf-tutorial.pdf
https://netdevconf.info/1.2/session.html?david-ahern-talk

BTW, in the example it should be:

ip link set dev veth-blue up master vrf-blue

Instead of:

ip link set dev veth-blue up master vrf-red

(Obviously)
David Ahern May 9, 2020, 9:09 p.m. UTC | #4
On 5/9/20 1:32 PM, Ido Schimmel wrote:
> On Sat, May 09, 2020 at 12:17:47PM -0700, Maciej Żenczykowski wrote:
>> Argh.  I've never understood the faintest thing about VRF's.
> 
> :)
> 
> There are many resources that David created over the years. For example:
> 
> https://www.kernel.org/doc/Documentation/networking/vrf.txt
> https://netdevconf.info/1.1/proceedings/slides/ahern-vrf-tutorial.pdf
> https://netdevconf.info/1.2/session.html?david-ahern-talk

best one is from OSS N.A. 2017:
    http://schd.ws/hosted_files/ossna2017/fe/vrf-tutorial-oss.pdf

Maciej: Basically, VRF narrows the scope to an L3 domain - devices
enslaved to the VRF device, their addresses and FIB entries in the VRF
table.
David Ahern May 9, 2020, 9:20 p.m. UTC | #5
On 5/9/20 1:15 PM, Ido Schimmel wrote:
> 
> This will unfortunately cause regressions with VRFs because they don't
> work correctly with ping sockets. Simple example:

Thanks for catching this, Ido.

> 
> diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
> index 535427292194..8463b0e9e811 100644
> --- a/net/ipv4/ping.c
> +++ b/net/ipv4/ping.c
> @@ -297,6 +297,7 @@ static int ping_check_bind_addr(struct sock *sk, struct inet_sock *isk,
>         struct net *net = sock_net(sk);
>         if (sk->sk_family == AF_INET) {
>                 struct sockaddr_in *addr = (struct sockaddr_in *) uaddr;
> +               u32 tb_id = RT_TABLE_LOCAL;
>                 int chk_addr_ret;
>  
>                 if (addr_len < sizeof(*addr))
> @@ -310,7 +311,15 @@ static int ping_check_bind_addr(struct sock *sk, struct inet_sock *isk,
>                 pr_debug("ping_check_bind_addr(sk=%p,addr=%pI4,port=%d)\n",
>                          sk, &addr->sin_addr.s_addr, ntohs(addr->sin_port));
>  
> -               chk_addr_ret = inet_addr_type(net, addr->sin_addr.s_addr);
> +               if (sk->sk_bound_dev_if) {

that's the key point - checking sk->sk_bound_dev_if.

> +                       tb_id = l3mdev_fib_table_by_index(net,
> +                                                         sk->sk_bound_dev_if);
> +                       if (!tb_id)
> +                               tb_id = RT_TABLE_LOCAL;
> +               }
> +
> +               chk_addr_ret = inet_addr_type_table(net, addr->sin_addr.s_addr,
> +                                                   tb_id);
>  
>                 if (addr->sin_addr.s_addr == htonl(INADDR_ANY))
>                         chk_addr_ret = RTN_LOCAL;
> 

From a scan of the ipv4/ping.c code, ping_v4_sendmsg also does not
acknowledge sk->sk_bound_dev_if.
Maciej Żenczykowski May 9, 2020, 9:35 p.m. UTC | #6
Do we have some sort of beginner's introduction to Linux VRF somewhere?
What they are? How to use them?

Currently the concept simply doesn't fit into my mental model of networking...

We've actually talked about maybe possibly using VRF's in Android (for
our multi network support)...
but no-one on our team has the faintest idea about how they work...
(and there's rumours that they don't work with ipv6 link local)
David Ahern May 10, 2020, 1:24 a.m. UTC | #7
On 5/9/20 3:35 PM, Maciej Żenczykowski wrote:
> Do we have some sort of beginner's introduction to Linux VRF somewhere?
> What they are? How to use them?

Ido's response gave introductory commands which can also be found here:
    https://www.kernel.org/doc/Documentation/networking/vrf.txt

This should answer most questions about more advanced topics:
    http://schd.ws/hosted_files/ossna2017/fe/vrf-tutorial-oss.pdf

Lately, I am putting blogs on https://people.kernel.org/dsahern for
recurring questions.

> 
> Currently the concept simply doesn't fit into my mental model of networking...

network namespaces = device level separation and up
VRF = Layer 3 and up separation

> 
> We've actually talked about maybe possibly using VRF's in Android (for
> our multi network support)...
> but no-one on our team has the faintest idea about how they work...
> (and there's rumours that they don't work with ipv6 link local)
> 

Rumors are ugly. If in doubt, ask. LLA with VRF is a primary requirement
from the beginning.

With 5.3 and up, you can have IPv4 routes with IPv6 LLA gateways with
and without VRFs.
Maciej Żenczykowski May 10, 2020, 5:15 a.m. UTC | #8
> Ido's response gave introductory commands which can also be found here:
>     https://www.kernel.org/doc/Documentation/networking/vrf.txt
>
> This should answer most questions about more advanced topics:
>     http://schd.ws/hosted_files/ossna2017/fe/vrf-tutorial-oss.pdf
>
> Lately, I am putting blogs on https://people.kernel.org/dsahern for
> recurring questions.

Thanks for that - I'll give it a look.

> Rumors are ugly. If in doubt, ask. LLA with VRF is a primary requirement
> from the beginning.

LLA? Link Level Aggregation?
Not sure what that has to do with VRF though... that's just bonding/teaming??
and seems to work fine without VRF at a (at least to me conceptually)
even lower layer.

> With 5.3 and up, you can have IPv4 routes with IPv6 LLA gateways with
> and without VRFs.

Ah, see... the latest phone hardware that I still don't have access to
(mostly because of covid),
is only 4.19 based (as such doing dev work on 4.14 or VMs, but getting
wifi/cell emulation
and ipv6 working right in VMs is very very hard, though we're making
slow progress).

5.4 hardware is probably 10 months out (assuming things return to normal).
We're always ~2 years behind, and playing catchup... :-(

And I recently switched some of our servers (those were stuck on 4.3
until very recently)
to use IPv4 egress routing via IPv6 ND, that's a great improvement
(and came at just the right time),
but again I guess I don't see how VRF fits in to the picture - this
seems to be just a use NDv6 for foo
instead of ARP for bar to figure out dst mac type of thing...

Obviously I haven't read your links, so perhaps all my questions will
be answered.
(I'm rambling, mostly writing this email to thank you for the links)
David Ahern May 10, 2020, 3:29 p.m. UTC | #9
On 5/9/20 11:15 PM, Maciej Żenczykowski wrote:
>> Rumors are ugly. If in doubt, ask. LLA with VRF is a primary requirement
>> from the beginning.
> 
> LLA? Link Level Aggregation?

Link Local Address
diff mbox series

Patch

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index cf58e29cf746..1a8cb6f3ee38 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1819,12 +1819,8 @@  static __net_init int inet_init_net(struct net *net)
 	net->ipv4.ip_local_ports.range[1] =  60999;
 
 	seqlock_init(&net->ipv4.ping_group_range.lock);
-	/*
-	 * Sane defaults - nobody may create ping sockets.
-	 * Boot scripts should set this to distro-specific group.
-	 */
-	net->ipv4.ping_group_range.range[0] = make_kgid(&init_user_ns, 1);
-	net->ipv4.ping_group_range.range[1] = make_kgid(&init_user_ns, 0);
+	net->ipv4.ping_group_range.range[0] = GLOBAL_ROOT_GID;
+	net->ipv4.ping_group_range.range[1] = KGIDT_INIT(0x7FFFFFFF);
 
 	/* Default values for sysctl-controlled parameters.
 	 * We set them here, in case sysctl is not compiled.