diff mbox

[net-next,09/17] net: Allow userns root control of the core of the network stack.

Message ID 1353070992-5552-9-git-send-email-ebiederm@xmission.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Eric W. Biederman Nov. 16, 2012, 1:03 p.m. UTC
From: "Eric W. Biederman" <ebiederm@xmission.com>

Allow an unpriviled user who has created a user namespace, and then
created a network namespace to effectively use the new network
namespace, by reducing capable(CAP_NET_ADMIN) and
capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.

Settings that merely control a single network device are allowed.
Either the network device is a logical network device where
restrictions make no difference or the network device is hardware NIC
that has been explicity moved from the initial network namespace.

In general policy and network stack state changes are allowed
while resource control is left unchanged.

Allow ethtool ioctls.

Allow binding to network devices.
Allow setting the socket mark.
Allow setting the socket priority.

Allow setting the network device alias via sysfs.
Allow setting the mtu via sysfs.
Allow changing the network device flags via sysfs.
Allow setting the network device group via sysfs.

Allow the following network device ioctls.
SIOCGMIIPHY
SIOCGMIIREG
SIOCSIFNAME
SIOCSIFFLAGS
SIOCSIFMETRIC
SIOCSIFMTU
SIOCSIFHWADDR
SIOCSIFSLAVE
SIOCADDMULTI
SIOCDELMULTI
SIOCSIFHWBROADCAST
SIOCSMIIREG
SIOCBONDENSLAVE
SIOCBONDRELEASE
SIOCBONDSETHWADDR
SIOCBONDCHANGEACTIVE
SIOCBRADDIF
SIOCBRDELIF
SIOCSHWTSTAMP

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 net/core/dev.c       |   17 +++++++++++++----
 net/core/ethtool.c   |    2 +-
 net/core/net-sysfs.c |   15 ++++++++++-----
 net/core/sock.c      |    7 ++++---
 4 files changed, 28 insertions(+), 13 deletions(-)

Comments

Glauber Costa Nov. 16, 2012, 1:55 p.m. UTC | #1
On 11/16/2012 05:03 PM, Eric W. Biederman wrote:
> +	if (!capable(CAP_NET_ADMIN))
> +		return -EPERM;
> +
>  	return netdev_store(dev, attr, buf, len, change_tx_queue_len);

You mean ns_capable here?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric W. Biederman Nov. 16, 2012, 2:32 p.m. UTC | #2
Glauber Costa <glommer@parallels.com> writes:

> On 11/16/2012 05:03 PM, Eric W. Biederman wrote:
>> +	if (!capable(CAP_NET_ADMIN))
>> +		return -EPERM;
>> +
>>  	return netdev_store(dev, attr, buf, len, change_tx_queue_len);
>
> You mean ns_capable here?

No.  There I meant capable.

I deliberately call capable here because I don't understand what
the tx_queue_len well enough to be certain it is safe to relax
that check to be just ns_capable.

My get feel is that allowing an unprivileged user to be able to
arbitrarily change the tx_queue_len on a networking device would be a
nice way to allow queuing as many network packets as you would like with
kernel memory and DOSing the machine.

So since with a quick read of the code I could not convince myself it
was safe to allow unprivilged users to change tx_queue_len I left it
protected by capable.  While at the same time I relaxed the check in
netdev_store to be ns_capable.

Eric


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Hutchings Nov. 17, 2012, 12:28 a.m. UTC | #3
On Fri, 2012-11-16 at 06:32 -0800, Eric W. Biederman wrote:
> Glauber Costa <glommer@parallels.com> writes:
> 
> > On 11/16/2012 05:03 PM, Eric W. Biederman wrote:
> >> +	if (!capable(CAP_NET_ADMIN))
> >> +		return -EPERM;
> >> +
> >>  	return netdev_store(dev, attr, buf, len, change_tx_queue_len);
> >
> > You mean ns_capable here?
> 
> No.  There I meant capable.
> 
> I deliberately call capable here because I don't understand what
> the tx_queue_len well enough to be certain it is safe to relax
> that check to be just ns_capable.
> 
> My get feel is that allowing an unprivileged user to be able to
> arbitrarily change the tx_queue_len on a networking device would be a
> nice way to allow queuing as many network packets as you would like with
> kernel memory and DOSing the machine.
> 
> So since with a quick read of the code I could not convince myself it
> was safe to allow unprivilged users to change tx_queue_len I left it
> protected by capable.  While at the same time I relaxed the check in
> netdev_store to be ns_capable.

Tor the same reason you had better be very selective about which ethtool
commands are allowed based on per-user_ns CAP_NET_ADMIN.  Consider for a
start:

ETHTOOL_SMSGLVL => fill up the system log
ETHTOOL_SEEPROM => brick the NIC
ETHTOOL_FLASHDEV => brick the NIC; own the system if it's not using an IOMMU

Ben.
Eric W. Biederman Nov. 17, 2012, 2:46 a.m. UTC | #4
Ben Hutchings <bhutchings@solarflare.com> writes:

> On Fri, 2012-11-16 at 06:32 -0800, Eric W. Biederman wrote:
>> Glauber Costa <glommer@parallels.com> writes:
>> 
>> > On 11/16/2012 05:03 PM, Eric W. Biederman wrote:
>> >> +	if (!capable(CAP_NET_ADMIN))
>> >> +		return -EPERM;
>> >> +
>> >>  	return netdev_store(dev, attr, buf, len, change_tx_queue_len);
>> >
>> > You mean ns_capable here?
>> 
>> No.  There I meant capable.
>> 
>> I deliberately call capable here because I don't understand what
>> the tx_queue_len well enough to be certain it is safe to relax
>> that check to be just ns_capable.
>> 
>> My get feel is that allowing an unprivileged user to be able to
>> arbitrarily change the tx_queue_len on a networking device would be a
>> nice way to allow queuing as many network packets as you would like with
>> kernel memory and DOSing the machine.
>> 
>> So since with a quick read of the code I could not convince myself it
>> was safe to allow unprivilged users to change tx_queue_len I left it
>> protected by capable.  While at the same time I relaxed the check in
>> netdev_store to be ns_capable.
>
> Tor the same reason you had better be very selective about which ethtool
> commands are allowed based on per-user_ns CAP_NET_ADMIN.  Consider for a
> start:
>
> ETHTOOL_SEEPROM => brick the NIC
> ETHTOOL_FLASHDEV => brick the NIC; own the system if it's not using an IOMMU

These are prevented by not having access to real hardware by default. A
physical network interface must be moved into a network namespace for
you to have access to it.

There are a handful of software network devices that are generally safe
macvlan, veth, tun, ipip tunnels, etc.  Using those network devices is
very interesting and about as performant as you can get while still
being safe.

A buffer overflow in an ethtool command looks as likely to me as being
able to own the system by reflashing the NIC.

Access to a real physical NIC is an act of trust.  Given the general
linux policy that drivers are merged when they mostly work I don't
currently know of any trust models between "I trust you with full access
to this device" and "I don't trust you with direct access to this
device" that I would feel confident giving to an untrusted user.

Which is a convoluted way of saying "ip link set eth0 netns bob" is the
moral equivalent of "chown bob.bob /dev/eth0; chmod u+rwx /dev/eth0"

> ETHTOOL_SMSGLVL => fill up the system log

That one might be worth doing something about, as there is non-local
effect.  Still I don't believe for any of the software based "safe"
networking devices ETHTOOL_SMSGLVL has any effect, and being able to
tweak the debug level could be important for debugging if you do have
direct access to the NIC.

Eric
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Hutchings Nov. 21, 2012, 6:29 p.m. UTC | #5
On Fri, 2012-11-16 at 18:46 -0800, Eric W. Biederman wrote:
> Ben Hutchings <bhutchings@solarflare.com> writes:
> 
> > On Fri, 2012-11-16 at 06:32 -0800, Eric W. Biederman wrote:
> >> Glauber Costa <glommer@parallels.com> writes:
> >> 
> >> > On 11/16/2012 05:03 PM, Eric W. Biederman wrote:
> >> >> +	if (!capable(CAP_NET_ADMIN))
> >> >> +		return -EPERM;
> >> >> +
> >> >>  	return netdev_store(dev, attr, buf, len, change_tx_queue_len);
> >> >
> >> > You mean ns_capable here?
> >> 
> >> No.  There I meant capable.
> >> 
> >> I deliberately call capable here because I don't understand what
> >> the tx_queue_len well enough to be certain it is safe to relax
> >> that check to be just ns_capable.
> >> 
> >> My get feel is that allowing an unprivileged user to be able to
> >> arbitrarily change the tx_queue_len on a networking device would be a
> >> nice way to allow queuing as many network packets as you would like with
> >> kernel memory and DOSing the machine.
> >> 
> >> So since with a quick read of the code I could not convince myself it
> >> was safe to allow unprivilged users to change tx_queue_len I left it
> >> protected by capable.  While at the same time I relaxed the check in
> >> netdev_store to be ns_capable.
> >
> > Tor the same reason you had better be very selective about which ethtool
> > commands are allowed based on per-user_ns CAP_NET_ADMIN.  Consider for a
> > start:
> >
> > ETHTOOL_SEEPROM => brick the NIC
> > ETHTOOL_FLASHDEV => brick the NIC; own the system if it's not using an IOMMU
> 
> These are prevented by not having access to real hardware by default. A
> physical network interface must be moved into a network namespace for
> you to have access to it.

Yes, I realise that.  The question is whether you would expect anything
in a container to be able to do those things, even with a physical net
device assigned to it.

Actually we have the same issue without considering containers - should
CAP_NET_ADMIN really give you low-level control over hardware just
because it's networking hardware?  I think some of these ethtool
operations, and access to non-standard MDIO registers, should perhaps
require an additional capability (CAP_SYS_ADMIN or CAP_SYS_RAWIO?).

> There are a handful of software network devices that are generally safe
> macvlan, veth, tun, ipip tunnels, etc.  Using those network devices is
> very interesting and about as performant as you can get while still
> being safe.
> 
> A buffer overflow in an ethtool command looks as likely to me as being
> able to own the system by reflashing the NIC.

Sure, if you can find one.  But on many NICs the firmware can perform
more or less arbitrary DMA *by design* (one reason for using IOMMUs),
and the ability to update the firmware is not a bug to be fixed!

> Access to a real physical NIC is an act of trust.  Given the general
> linux policy that drivers are merged when they mostly work I don't
> currently know of any trust models between "I trust you with full access
> to this device" and "I don't trust you with direct access to this
> device" that I would feel confident giving to an untrusted user.

At the moment it's 'I trust you with full access to *all* network
devices' (init ns CAP_NET_ADMIN), 'I trust you with some reconfiguration
of these network devices' (other ns CAP_NET_ADMIN) and 'I don't trust
you...'

You're expanding what other-ns-CAP_NET_ADMIN means, to 'I trust you with
full access to these network devices'.

> Which is a convoluted way of saying "ip link set eth0 netns bob" is the
> moral equivalent of "chown bob.bob /dev/eth0; chmod u+rwx /dev/eth0"
[...]

And it's previously been decided that ownership of a block device still
should *not* mean full control over it (see responses to CVE-2011-4127).

Ben.
diff mbox

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index 09cb3f6..7150ea9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5200,7 +5200,7 @@  int dev_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	case SIOCGMIIPHY:
 	case SIOCGMIIREG:
 	case SIOCSIFNAME:
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 		dev_load(net, ifr.ifr_name);
 		rtnl_lock();
@@ -5221,16 +5221,25 @@  int dev_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	 *	- require strict serialization.
 	 *	- do not return a value
 	 */
+	case SIOCSIFMAP:
+	case SIOCSIFTXQLEN:
+		if (!capable(CAP_NET_ADMIN))
+			return -EPERM;
+		/* fall through */
+	/*
+	 *	These ioctl calls:
+	 *	- require local superuser power.
+	 *	- require strict serialization.
+	 *	- do not return a value
+	 */
 	case SIOCSIFFLAGS:
 	case SIOCSIFMETRIC:
 	case SIOCSIFMTU:
-	case SIOCSIFMAP:
 	case SIOCSIFHWADDR:
 	case SIOCSIFSLAVE:
 	case SIOCADDMULTI:
 	case SIOCDELMULTI:
 	case SIOCSIFHWBROADCAST:
-	case SIOCSIFTXQLEN:
 	case SIOCSMIIREG:
 	case SIOCBONDENSLAVE:
 	case SIOCBONDRELEASE:
@@ -5239,7 +5248,7 @@  int dev_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	case SIOCBRADDIF:
 	case SIOCBRDELIF:
 	case SIOCSHWTSTAMP:
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 		/* fall through */
 	case SIOCBONDSLAVEINFOQUERY:
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 4d64cc2..a870543 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -1460,7 +1460,7 @@  int dev_ethtool(struct net *net, struct ifreq *ifr)
 	case ETHTOOL_GEEE:
 		break;
 	default:
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 	}
 
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index bcf02f6..c66b8c2 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -73,11 +73,12 @@  static ssize_t netdev_store(struct device *dev, struct device_attribute *attr,
 			    const char *buf, size_t len,
 			    int (*set)(struct net_device *, unsigned long))
 {
-	struct net_device *net = to_net_dev(dev);
+	struct net_device *netdev = to_net_dev(dev);
+	struct net *net = dev_net(netdev);
 	unsigned long new;
 	int ret = -EINVAL;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	ret = kstrtoul(buf, 0, &new);
@@ -87,8 +88,8 @@  static ssize_t netdev_store(struct device *dev, struct device_attribute *attr,
 	if (!rtnl_trylock())
 		return restart_syscall();
 
-	if (dev_isalive(net)) {
-		if ((ret = (*set)(net, new)) == 0)
+	if (dev_isalive(netdev)) {
+		if ((ret = (*set)(netdev, new)) == 0)
 			ret = len;
 	}
 	rtnl_unlock();
@@ -264,6 +265,9 @@  static ssize_t store_tx_queue_len(struct device *dev,
 				  struct device_attribute *attr,
 				  const char *buf, size_t len)
 {
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	return netdev_store(dev, attr, buf, len, change_tx_queue_len);
 }
 
@@ -271,10 +275,11 @@  static ssize_t store_ifalias(struct device *dev, struct device_attribute *attr,
 			     const char *buf, size_t len)
 {
 	struct net_device *netdev = to_net_dev(dev);
+	struct net *net = dev_net(netdev);
 	size_t count = len;
 	ssize_t ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	/* ignore trailing newline */
diff --git a/net/core/sock.c b/net/core/sock.c
index 8a146cf..85d75cb 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -515,7 +515,7 @@  static int sock_bindtodevice(struct sock *sk, char __user *optval, int optlen)
 
 	/* Sorry... */
 	ret = -EPERM;
-	if (!capable(CAP_NET_RAW))
+	if (!ns_capable(net->user_ns, CAP_NET_RAW))
 		goto out;
 
 	ret = -EINVAL;
@@ -696,7 +696,8 @@  set_rcvbuf:
 		break;
 
 	case SO_PRIORITY:
-		if ((val >= 0 && val <= 6) || capable(CAP_NET_ADMIN))
+		if ((val >= 0 && val <= 6) ||
+		    ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 			sk->sk_priority = val;
 		else
 			ret = -EPERM;
@@ -813,7 +814,7 @@  set_rcvbuf:
 			clear_bit(SOCK_PASSSEC, &sock->flags);
 		break;
 	case SO_MARK:
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 			ret = -EPERM;
 		else
 			sk->sk_mark = val;