diff mbox series

ipvs: allow netlink configuration from non-initial user namespace

Message ID 20240307203107.63815-1-michael.weiss@aisec.fraunhofer.de
State Handled Elsewhere
Headers show
Series ipvs: allow netlink configuration from non-initial user namespace | expand

Commit Message

Michael Weiß March 7, 2024, 8:31 p.m. UTC
Configuring ipvs in a non-initial user namespace using the genl
netlink interface, e.g., by 'ipvsadm' is currently resulting in an
'-EPERM'. This is due to the use of GENL_ADMIN_PERM flag in
'ip_vs_ctl.c'.

Similarly to other genl interfaces, we switch to the use of
GENL_UNS_ADMIN_PERM flag which allows connection from non-initial
user namespace. Thus, it would be feasible to configure ipvs using
the genl interface also from within an unprivileged system container.

Since adding of new services and new dests are triggered from
userspace, accounting for the corresponding memory allocations in
ip_vs_new_dest() and ip_vs_add_service() is activated.

We tested this by simply running some samples from "man ipvsadm"
within an unprivileged user namespaced system container in GyroidOS.
Further, we successfully passed an adapted version of the ipvs
selftest in 'tools/testing/selftests/netfilter/ipvs.sh' using
preliminary created network namespaces from unprivileged GyroidOS
containers.

Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
---
 net/netfilter/ipvs/ip_vs_ctl.c | 36 +++++++++++++++++-----------------
 1 file changed, 18 insertions(+), 18 deletions(-)

Comments

Julian Anastasov March 8, 2024, 7:55 a.m. UTC | #1
Hello,

On Thu, 7 Mar 2024, Michael Weiß wrote:

> Configuring ipvs in a non-initial user namespace using the genl
> netlink interface, e.g., by 'ipvsadm' is currently resulting in an
> '-EPERM'. This is due to the use of GENL_ADMIN_PERM flag in
> 'ip_vs_ctl.c'.
> 
> Similarly to other genl interfaces, we switch to the use of
> GENL_UNS_ADMIN_PERM flag which allows connection from non-initial
> user namespace. Thus, it would be feasible to configure ipvs using
> the genl interface also from within an unprivileged system container.
> 
> Since adding of new services and new dests are triggered from
> userspace, accounting for the corresponding memory allocations in
> ip_vs_new_dest() and ip_vs_add_service() is activated.
> 
> We tested this by simply running some samples from "man ipvsadm"
> within an unprivileged user namespaced system container in GyroidOS.
> Further, we successfully passed an adapted version of the ipvs
> selftest in 'tools/testing/selftests/netfilter/ipvs.sh' using
> preliminary created network namespaces from unprivileged GyroidOS
> containers.

	I planned such change but as followup patchset to other
work which converts many structures to be per-netns.

	There is a RFC v2 patchset for reference:

https://archive.linuxvirtualserver.org/html/lvs-devel/2023-12/index.html

	My goal was to isolate the different namespaces as much as
possible: different structures, different kthreads, etc. with the
goal to reduce the security risks of giving power to unprivileged roots.
Such isolation should help when namespaces are served from different CPUs.

	May be I should push fresh v3 soon, so that we can later use
GFP_KERNEL_ACCOUNT not only for services and dests but also
for allocations by schedulers, estimators, etc. The access to
sysctl vars should be enabled too, around comment
"Don't export sysctls to unprivileged users",
alloc_percpu => alloc_percpu_gfp(,GFP_KERNEL_ACCOUNT),
SLAB_ACCOUNT for kmem_cache_create, not sure about __GFP_NOWARN and
__GFP_NORETRY usage too.

	Not sure about the sysctl vars: now they are cloned from
init_net, do we give full access for writing, some can be privileged,
etc.

	I didn't push such changes yet because I'm not sure what
is needed: looks like, for now, what was needed is root from init_net to 
control rules in different netns and there was no demand from the 
virtualization world to extend this. If we can clearly define what is 
good and what is bad from security perspective, we can go with such 
changes after pushing the above patchset, i.e. the GENL_UNS_ADMIN_PERM
change should follow all other changes.

> Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
> ---
>  net/netfilter/ipvs/ip_vs_ctl.c | 36 +++++++++++++++++-----------------
>  1 file changed, 18 insertions(+), 18 deletions(-)
> 
> diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
> index 143a341bbc0a..d39120c64207 100644
> --- a/net/netfilter/ipvs/ip_vs_ctl.c
> +++ b/net/netfilter/ipvs/ip_vs_ctl.c
> @@ -1080,7 +1080,7 @@ ip_vs_new_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest)
>  			return -EINVAL;
>  	}
>  
> -	dest = kzalloc(sizeof(struct ip_vs_dest), GFP_KERNEL);
> +	dest = kzalloc(sizeof(struct ip_vs_dest), GFP_KERNEL_ACCOUNT);
>  	if (dest == NULL)
>  		return -ENOMEM;
>  
> @@ -1421,7 +1421,7 @@ ip_vs_add_service(struct netns_ipvs *ipvs, struct ip_vs_service_user_kern *u,
>  		ret_hooks = ret;
>  	}
>  
> -	svc = kzalloc(sizeof(struct ip_vs_service), GFP_KERNEL);
> +	svc = kzalloc(sizeof(struct ip_vs_service), GFP_KERNEL_ACCOUNT);
>  	if (svc == NULL) {
>  		IP_VS_DBG(1, "%s(): no memory\n", __func__);
>  		ret = -ENOMEM;
> @@ -4139,98 +4139,98 @@ static const struct genl_small_ops ip_vs_genl_ops[] = {
>  	{
>  		.cmd	= IPVS_CMD_NEW_SERVICE,
>  		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
> -		.flags	= GENL_ADMIN_PERM,
> +		.flags	= GENL_UNS_ADMIN_PERM,
>  		.doit	= ip_vs_genl_set_cmd,
...

Regards

--
Julian Anastasov <ja@ssi.bg>
Michael Weiß March 8, 2024, 11:17 a.m. UTC | #2
On 3/8/24 08:55, Julian Anastasov wrote:
> 
> 	Hello,
> 
> On Thu, 7 Mar 2024, Michael Weiß wrote:
> 
>> Configuring ipvs in a non-initial user namespace using the genl
>> netlink interface, e.g., by 'ipvsadm' is currently resulting in an
>> '-EPERM'. This is due to the use of GENL_ADMIN_PERM flag in
>> 'ip_vs_ctl.c'.
>>
>> Similarly to other genl interfaces, we switch to the use of
>> GENL_UNS_ADMIN_PERM flag which allows connection from non-initial
>> user namespace. Thus, it would be feasible to configure ipvs using
>> the genl interface also from within an unprivileged system container.
>>
>> Since adding of new services and new dests are triggered from
>> userspace, accounting for the corresponding memory allocations in
>> ip_vs_new_dest() and ip_vs_add_service() is activated.
>>
>> We tested this by simply running some samples from "man ipvsadm"
>> within an unprivileged user namespaced system container in GyroidOS.
>> Further, we successfully passed an adapted version of the ipvs
>> selftest in 'tools/testing/selftests/netfilter/ipvs.sh' using
>> preliminary created network namespaces from unprivileged GyroidOS
>> containers.
> 
> 	I planned such change but as followup patchset to other
> work which converts many structures to be per-netns.
> 
> 	There is a RFC v2 patchset for reference:
> 
> https://archive.linuxvirtualserver.org/html/lvs-devel/2023-12/index.html
> 
> 	My goal was to isolate the different namespaces as much as
> possible: different structures, different kthreads, etc. with the
> goal to reduce the security risks of giving power to unprivileged roots.
> Such isolation should help when namespaces are served from different CPUs.
> 
> 	May be I should push fresh v3 soon, so that we can later use
> GFP_KERNEL_ACCOUNT not only for services and dests but also
> for allocations by schedulers, estimators, etc. The access to
> sysctl vars should be enabled too, around comment
> "Don't export sysctls to unprivileged users",
> alloc_percpu => alloc_percpu_gfp(,GFP_KERNEL_ACCOUNT),
> SLAB_ACCOUNT for kmem_cache_create, not sure about __GFP_NOWARN and
> __GFP_NORETRY usage too.
> 
> 	Not sure about the sysctl vars: now they are cloned from
> init_net, do we give full access for writing, some can be privileged,
> etc.

Oh yes your right that I have missed. I think non-global per-netns
sysctls should be save to be allowed for unprivileged roots.
sysfs can only be mounted rw in a new private netns. Just unsharing the
userns which then does not own the netns, means that sysfs can only be
mounted read-only or the already mounted sysfs gets the overflow uids.

> 
> 	I didn't push such changes yet because I'm not sure what
> is needed: looks like, for now, what was needed is root from init_net to 
> control rules in different netns and there was no demand from the 
> virtualization world to extend this. If we can clearly define what is 
> good and what is bad from security perspective, we can go with such 
> changes after pushing the above patchset, i.e. the GENL_UNS_ADMIN_PERM
> change should follow all other changes.

Just to clarify the need to control ipvs from within non-init userns:
In GyroidOS, we have only a minimal ramdisk-based system containing a
container manager in the initial namespace. A first "core" container is
already running in an unprivileged user namespace with a private netns.
This container gets all physical network devices moved into its netns
and is responsible for routing and all other networking features,
ovs, batman, l2tp, etc... and also ipvs.
	
Nevertheless, it makes sense to wait for your changes to be merged first.

Thanx,
Michael


> 
>> Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
>> ---
>>  net/netfilter/ipvs/ip_vs_ctl.c | 36 +++++++++++++++++-----------------
>>  1 file changed, 18 insertions(+), 18 deletions(-)
>>
>> diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
>> index 143a341bbc0a..d39120c64207 100644
>> --- a/net/netfilter/ipvs/ip_vs_ctl.c
>> +++ b/net/netfilter/ipvs/ip_vs_ctl.c
>> @@ -1080,7 +1080,7 @@ ip_vs_new_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest)
>>  			return -EINVAL;
>>  	}
>>  
>> -	dest = kzalloc(sizeof(struct ip_vs_dest), GFP_KERNEL);
>> +	dest = kzalloc(sizeof(struct ip_vs_dest), GFP_KERNEL_ACCOUNT);
>>  	if (dest == NULL)
>>  		return -ENOMEM;
>>  
>> @@ -1421,7 +1421,7 @@ ip_vs_add_service(struct netns_ipvs *ipvs, struct ip_vs_service_user_kern *u,
>>  		ret_hooks = ret;
>>  	}
>>  
>> -	svc = kzalloc(sizeof(struct ip_vs_service), GFP_KERNEL);
>> +	svc = kzalloc(sizeof(struct ip_vs_service), GFP_KERNEL_ACCOUNT);
>>  	if (svc == NULL) {
>>  		IP_VS_DBG(1, "%s(): no memory\n", __func__);
>>  		ret = -ENOMEM;
>> @@ -4139,98 +4139,98 @@ static const struct genl_small_ops ip_vs_genl_ops[] = {
>>  	{
>>  		.cmd	= IPVS_CMD_NEW_SERVICE,
>>  		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
>> -		.flags	= GENL_ADMIN_PERM,
>> +		.flags	= GENL_UNS_ADMIN_PERM,
>>  		.doit	= ip_vs_genl_set_cmd,
> ...
> 
> Regards
> 
> --
> Julian Anastasov <ja@ssi.bg>
diff mbox series

Patch

diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 143a341bbc0a..d39120c64207 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -1080,7 +1080,7 @@  ip_vs_new_dest(struct ip_vs_service *svc, struct ip_vs_dest_user_kern *udest)
 			return -EINVAL;
 	}
 
-	dest = kzalloc(sizeof(struct ip_vs_dest), GFP_KERNEL);
+	dest = kzalloc(sizeof(struct ip_vs_dest), GFP_KERNEL_ACCOUNT);
 	if (dest == NULL)
 		return -ENOMEM;
 
@@ -1421,7 +1421,7 @@  ip_vs_add_service(struct netns_ipvs *ipvs, struct ip_vs_service_user_kern *u,
 		ret_hooks = ret;
 	}
 
-	svc = kzalloc(sizeof(struct ip_vs_service), GFP_KERNEL);
+	svc = kzalloc(sizeof(struct ip_vs_service), GFP_KERNEL_ACCOUNT);
 	if (svc == NULL) {
 		IP_VS_DBG(1, "%s(): no memory\n", __func__);
 		ret = -ENOMEM;
@@ -4139,98 +4139,98 @@  static const struct genl_small_ops ip_vs_genl_ops[] = {
 	{
 		.cmd	= IPVS_CMD_NEW_SERVICE,
 		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
-		.flags	= GENL_ADMIN_PERM,
+		.flags	= GENL_UNS_ADMIN_PERM,
 		.doit	= ip_vs_genl_set_cmd,
 	},
 	{
 		.cmd	= IPVS_CMD_SET_SERVICE,
 		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
-		.flags	= GENL_ADMIN_PERM,
+		.flags	= GENL_UNS_ADMIN_PERM,
 		.doit	= ip_vs_genl_set_cmd,
 	},
 	{
 		.cmd	= IPVS_CMD_DEL_SERVICE,
 		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
-		.flags	= GENL_ADMIN_PERM,
+		.flags	= GENL_UNS_ADMIN_PERM,
 		.doit	= ip_vs_genl_set_cmd,
 	},
 	{
 		.cmd	= IPVS_CMD_GET_SERVICE,
 		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
-		.flags	= GENL_ADMIN_PERM,
+		.flags	= GENL_UNS_ADMIN_PERM,
 		.doit	= ip_vs_genl_get_cmd,
 		.dumpit	= ip_vs_genl_dump_services,
 	},
 	{
 		.cmd	= IPVS_CMD_NEW_DEST,
 		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
-		.flags	= GENL_ADMIN_PERM,
+		.flags	= GENL_UNS_ADMIN_PERM,
 		.doit	= ip_vs_genl_set_cmd,
 	},
 	{
 		.cmd	= IPVS_CMD_SET_DEST,
 		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
-		.flags	= GENL_ADMIN_PERM,
+		.flags	= GENL_UNS_ADMIN_PERM,
 		.doit	= ip_vs_genl_set_cmd,
 	},
 	{
 		.cmd	= IPVS_CMD_DEL_DEST,
 		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
-		.flags	= GENL_ADMIN_PERM,
+		.flags	= GENL_UNS_ADMIN_PERM,
 		.doit	= ip_vs_genl_set_cmd,
 	},
 	{
 		.cmd	= IPVS_CMD_GET_DEST,
 		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
-		.flags	= GENL_ADMIN_PERM,
+		.flags	= GENL_UNS_ADMIN_PERM,
 		.dumpit	= ip_vs_genl_dump_dests,
 	},
 	{
 		.cmd	= IPVS_CMD_NEW_DAEMON,
 		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
-		.flags	= GENL_ADMIN_PERM,
+		.flags	= GENL_UNS_ADMIN_PERM,
 		.doit	= ip_vs_genl_set_daemon,
 	},
 	{
 		.cmd	= IPVS_CMD_DEL_DAEMON,
 		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
-		.flags	= GENL_ADMIN_PERM,
+		.flags	= GENL_UNS_ADMIN_PERM,
 		.doit	= ip_vs_genl_set_daemon,
 	},
 	{
 		.cmd	= IPVS_CMD_GET_DAEMON,
 		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
-		.flags	= GENL_ADMIN_PERM,
+		.flags	= GENL_UNS_ADMIN_PERM,
 		.dumpit	= ip_vs_genl_dump_daemons,
 	},
 	{
 		.cmd	= IPVS_CMD_SET_CONFIG,
 		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
-		.flags	= GENL_ADMIN_PERM,
+		.flags	= GENL_UNS_ADMIN_PERM,
 		.doit	= ip_vs_genl_set_cmd,
 	},
 	{
 		.cmd	= IPVS_CMD_GET_CONFIG,
 		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
-		.flags	= GENL_ADMIN_PERM,
+		.flags	= GENL_UNS_ADMIN_PERM,
 		.doit	= ip_vs_genl_get_cmd,
 	},
 	{
 		.cmd	= IPVS_CMD_GET_INFO,
 		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
-		.flags	= GENL_ADMIN_PERM,
+		.flags	= GENL_UNS_ADMIN_PERM,
 		.doit	= ip_vs_genl_get_cmd,
 	},
 	{
 		.cmd	= IPVS_CMD_ZERO,
 		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
-		.flags	= GENL_ADMIN_PERM,
+		.flags	= GENL_UNS_ADMIN_PERM,
 		.doit	= ip_vs_genl_set_cmd,
 	},
 	{
 		.cmd	= IPVS_CMD_FLUSH,
 		.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
-		.flags	= GENL_ADMIN_PERM,
+		.flags	= GENL_UNS_ADMIN_PERM,
 		.doit	= ip_vs_genl_set_cmd,
 	},
 };