mbox series

[v2,net-next,0/7] Make /sys/class/net per net namespace objects belong to container

Message ID 1531497949-1766-1-git-send-email-tyhicks@canonical.com
Headers show
Series Make /sys/class/net per net namespace objects belong to container | expand

Message

Tyler Hicks July 13, 2018, 4:05 p.m. UTC
This is a revival of an older patch set from Dmitry Torokhov:

 https://lore.kernel.org/lkml/1471386795-32918-1-git-send-email-dmitry.torokhov@gmail.com/

Here's Dmitry's description:

 There are objects in /sys hierarchy (/sys/class/net/) that logically
 belong to a namespace/container. Unfortunately all sysfs objects start
 their life belonging to global root, and while we could change
 ownership manually, keeping tracks of all objects that come and go is
 cumbersome. It would be better if kernel created them using correct
 uid/gid from the beginning. 

 This series changes kernfs to allow creating object's with arbitrary
 uid/gid, adds get_ownership() callback to ktype structure so subsystems
 could supply their own logic (likely tied to namespace support) for
 determining ownership of kobjects, and adjusts sysfs code to make use
 of this information. Lastly net-sysfs is adjusted to make sure that
 objects in net namespace are owned by the root user from the owning
 user namespace.

 Note that we do not adjust ownership of objects moved into a new
 namespace (as when moving a network device into a container) as
 userspace can easily do it.

I'm reviving this patch set because we would like this feature for
system containers. One specific use case that we have is that libvirt is
unable to configure its bridge device inside of a system container due
to the bridge files in /sys/class/net/ being owned by init root instead
of container root. The last two patches in this set are patches that
I've added to Dmitry's original set to allow such configuration of the
bridge device.

Eric had previously provided feedback that he didn't favor these changes
affecting all layers of the stack and that most of the changes could
remain local to drivers/base/core.c. That feedback is certainly sensible
but I wanted to send out v2 of the patch set without making that large
of a change since quite a bit of time has passed and the bridge changes
in the last patch of this set shows that not all of the changes will be
local to drivers/base/core.c. I'm happy to make the changes if the
original request still stands.

I've verified that all of the bridge related files affected by patch 7
have proper access control checks for CAP_NET_ADMIN inside of the
user namespace. I have *not* yet verified that all of the network
device related sysfs files affected by patch 5 have proper access
control checks. I was working under the assumption that those code paths
already were verified when the first iteration of the patches were sent
out.

* Changes since v1:
  - Patch 1 was forward ported to use idr instead of ida for the inode
    num
  - Patch 5 was forward ported around the ro_after_init changes
  - Patch 5 received a build failure fix for !CONFIG_SYSFS
  - Patch 6 and 7 are new

Thanks!

Tyler

Comments

David Miller July 16, 2018, 8:58 p.m. UTC | #1
From: Tyler Hicks <tyhicks@canonical.com>
Date: Fri, 13 Jul 2018 16:05:42 +0000

> Eric had previously provided feedback that he didn't favor these changes
> affecting all layers of the stack and that most of the changes could
> remain local to drivers/base/core.c. That feedback is certainly sensible
> but I wanted to send out v2 of the patch set without making that large
> of a change since quite a bit of time has passed and the bridge changes
> in the last patch of this set shows that not all of the changes will be
> local to drivers/base/core.c. I'm happy to make the changes if the
> original request still stands.

I'd like to give Eric an opportunity to review this and give feedback
before applying.

Thanks.
David Miller July 18, 2018, 4:17 a.m. UTC | #2
From: Tyler Hicks <tyhicks@canonical.com>
Date: Fri, 13 Jul 2018 16:05:42 +0000

> I'm reviving this patch set because we would like this feature for
> system containers. One specific use case that we have is that libvirt is
> unable to configure its bridge device inside of a system container due
> to the bridge files in /sys/class/net/ being owned by init root instead
> of container root. The last two patches in this set are patches that
> I've added to Dmitry's original set to allow such configuration of the
> bridge device.
> 
> Eric had previously provided feedback that he didn't favor these changes
> affecting all layers of the stack and that most of the changes could
> remain local to drivers/base/core.c. That feedback is certainly sensible
> but I wanted to send out v2 of the patch set without making that large
> of a change since quite a bit of time has passed and the bridge changes
> in the last patch of this set shows that not all of the changes will be
> local to drivers/base/core.c. I'm happy to make the changes if the
> original request still stands.
> 
> I've verified that all of the bridge related files affected by patch 7
> have proper access control checks for CAP_NET_ADMIN inside of the
> user namespace. I have *not* yet verified that all of the network
> device related sysfs files affected by patch 5 have proper access
> control checks. I was working under the assumption that those code paths
> already were verified when the first iteration of the patches were sent
> out.

Ok, I can't let this series rot forever, so I'll apply it to net-next.

Thank you.
David Miller July 18, 2018, 4:41 a.m. UTC | #3
From: David Miller <davem@davemloft.net>
Date: Wed, 18 Jul 2018 13:17:34 +0900 (KST)

> Ok, I can't let this series rot forever, so I'll apply it to net-next.

Unfortunately, I had to revert, this breaks the build:

arch/x86/kernel/cpu/intel_rdt_rdtgroup.c:1506:7: error: too few arguments to function ‘__kernfs_create_file’
  kn = __kernfs_create_file(parent_kn, name, 0444, 0,
Tyler Hicks July 19, 2018, 1:07 a.m. UTC | #4
On 07/17/2018 11:41 PM, David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Wed, 18 Jul 2018 13:17:34 +0900 (KST)
> 
>> Ok, I can't let this series rot forever, so I'll apply it to net-next.
> 
> Unfortunately, I had to revert, this breaks the build:
> 
> arch/x86/kernel/cpu/intel_rdt_rdtgroup.c:1506:7: error: too few arguments to function ‘__kernfs_create_file’
>   kn = __kernfs_create_file(parent_kn, name, 0444, 0,
> 

I've got a fix for this. New __kernfs_create_file() users were added
since the v1 of the patch set (defconfig didn't build that code for me).

However, I'm starting to question my assumption that sufficient access
control checks are all in place for the attributes affected by patch #5.
I see a few affected attributes which don't make any capable() calls and
I'm not yet through the entire list.

My current plan is to roll in my build failure fix, drop patch #5,
retest and resubmit as a v3. I wasn't able to get to that today but
should be able to by the end of the week.

Tyler