mbox series

[bpf-next,0/4] sys_bpf() access control via /dev/bpf

Message ID 20190625182303.874270-1-songliubraving@fb.com
Headers show
Series sys_bpf() access control via /dev/bpf | expand

Message

Song Liu June 25, 2019, 6:22 p.m. UTC
Currently, most access to sys_bpf() is limited to root. However, there are
use cases that would benefit from non-privileged use of sys_bpf(), e.g.
systemd.

This set introduces a new model to control the access to sys_bpf(). A
special device, /dev/bpf, is introduced to manage access to sys_bpf().
Users with access to open /dev/bpf will be able to access most of
sys_bpf() features. The use can get access to sys_bpf() by opening /dev/bpf
and use ioctl to get/put permission.

The permission to access sys_bpf() is marked by bit TASK_BPF_FLAG_PERMITTED
in task_struct. During fork(), child will not inherit this bit.

libbpf APIs libbpf_[get|put]_bpf_permission() are added to help get and
put the permission. bpftool is updated to use these APIs.

Song Liu (4):
  bpf: unprivileged BPF access via /dev/bpf
  bpf: sync tools/include/uapi/linux/bpf.h
  libbpf: add libbpf_[get|put]_bpf_permission()
  bpftool: use libbpf_[get|put]_bpf_permission()

 Documentation/ioctl/ioctl-number.txt |  1 +
 include/linux/bpf.h                  | 12 +++++
 include/linux/sched.h                |  8 ++++
 include/uapi/linux/bpf.h             |  5 ++
 kernel/bpf/arraymap.c                |  2 +-
 kernel/bpf/cgroup.c                  |  2 +-
 kernel/bpf/core.c                    |  4 +-
 kernel/bpf/cpumap.c                  |  2 +-
 kernel/bpf/devmap.c                  |  2 +-
 kernel/bpf/hashtab.c                 |  4 +-
 kernel/bpf/lpm_trie.c                |  2 +-
 kernel/bpf/offload.c                 |  2 +-
 kernel/bpf/queue_stack_maps.c        |  2 +-
 kernel/bpf/reuseport_array.c         |  2 +-
 kernel/bpf/stackmap.c                |  2 +-
 kernel/bpf/syscall.c                 | 72 +++++++++++++++++++++-------
 kernel/bpf/verifier.c                |  2 +-
 kernel/bpf/xskmap.c                  |  2 +-
 kernel/fork.c                        |  4 ++
 net/core/filter.c                    |  6 +--
 tools/bpf/bpftool/feature.c          |  2 +-
 tools/bpf/bpftool/main.c             |  5 ++
 tools/include/uapi/linux/bpf.h       |  5 ++
 tools/lib/bpf/libbpf.c               | 54 +++++++++++++++++++++
 tools/lib/bpf/libbpf.h               |  7 +++
 tools/lib/bpf/libbpf.map             |  2 +
 26 files changed, 178 insertions(+), 35 deletions(-)

--
2.17.1

Comments

Stanislav Fomichev June 25, 2019, 8:51 p.m. UTC | #1
On 06/25, Song Liu wrote:
> Currently, most access to sys_bpf() is limited to root. However, there are
> use cases that would benefit from non-privileged use of sys_bpf(), e.g.
> systemd.
> 
> This set introduces a new model to control the access to sys_bpf(). A
> special device, /dev/bpf, is introduced to manage access to sys_bpf().
> Users with access to open /dev/bpf will be able to access most of
> sys_bpf() features. The use can get access to sys_bpf() by opening /dev/bpf
> and use ioctl to get/put permission.
> 
> The permission to access sys_bpf() is marked by bit TASK_BPF_FLAG_PERMITTED
> in task_struct. During fork(), child will not inherit this bit.
2c: if we are going to have an fd, I'd vote for a proper fd based access
checks instead of a per-task flag, so we can do:
	ioctl(fd, BPF_MAP_CREATE, uattr, sizeof(uattr))

(and pass this fd around)

I do understand that it breaks current assumptions that libbpf has,
but maybe we can extend _xattr variants to accept optinal fd (and try
to fallback to sysctl if it's absent/not working)?

> libbpf APIs libbpf_[get|put]_bpf_permission() are added to help get and
> put the permission. bpftool is updated to use these APIs.
> 
> Song Liu (4):
>   bpf: unprivileged BPF access via /dev/bpf
>   bpf: sync tools/include/uapi/linux/bpf.h
>   libbpf: add libbpf_[get|put]_bpf_permission()
>   bpftool: use libbpf_[get|put]_bpf_permission()
> 
>  Documentation/ioctl/ioctl-number.txt |  1 +
>  include/linux/bpf.h                  | 12 +++++
>  include/linux/sched.h                |  8 ++++
>  include/uapi/linux/bpf.h             |  5 ++
>  kernel/bpf/arraymap.c                |  2 +-
>  kernel/bpf/cgroup.c                  |  2 +-
>  kernel/bpf/core.c                    |  4 +-
>  kernel/bpf/cpumap.c                  |  2 +-
>  kernel/bpf/devmap.c                  |  2 +-
>  kernel/bpf/hashtab.c                 |  4 +-
>  kernel/bpf/lpm_trie.c                |  2 +-
>  kernel/bpf/offload.c                 |  2 +-
>  kernel/bpf/queue_stack_maps.c        |  2 +-
>  kernel/bpf/reuseport_array.c         |  2 +-
>  kernel/bpf/stackmap.c                |  2 +-
>  kernel/bpf/syscall.c                 | 72 +++++++++++++++++++++-------
>  kernel/bpf/verifier.c                |  2 +-
>  kernel/bpf/xskmap.c                  |  2 +-
>  kernel/fork.c                        |  4 ++
>  net/core/filter.c                    |  6 +--
>  tools/bpf/bpftool/feature.c          |  2 +-
>  tools/bpf/bpftool/main.c             |  5 ++
>  tools/include/uapi/linux/bpf.h       |  5 ++
>  tools/lib/bpf/libbpf.c               | 54 +++++++++++++++++++++
>  tools/lib/bpf/libbpf.h               |  7 +++
>  tools/lib/bpf/libbpf.map             |  2 +
>  26 files changed, 178 insertions(+), 35 deletions(-)
> 
> --
> 2.17.1
Alexei Starovoitov June 25, 2019, 9 p.m. UTC | #2
On 6/25/19 1:51 PM, Stanislav Fomichev wrote:
> On 06/25, Song Liu wrote:
>> Currently, most access to sys_bpf() is limited to root. However, there are
>> use cases that would benefit from non-privileged use of sys_bpf(), e.g.
>> systemd.
>>
>> This set introduces a new model to control the access to sys_bpf(). A
>> special device, /dev/bpf, is introduced to manage access to sys_bpf().
>> Users with access to open /dev/bpf will be able to access most of
>> sys_bpf() features. The use can get access to sys_bpf() by opening /dev/bpf
>> and use ioctl to get/put permission.
>>
>> The permission to access sys_bpf() is marked by bit TASK_BPF_FLAG_PERMITTED
>> in task_struct. During fork(), child will not inherit this bit.
> 2c: if we are going to have an fd, I'd vote for a proper fd based access
> checks instead of a per-task flag, so we can do:
> 	ioctl(fd, BPF_MAP_CREATE, uattr, sizeof(uattr))
> 
> (and pass this fd around)
> 
> I do understand that it breaks current assumptions that libbpf has,
> but maybe we can extend _xattr variants to accept optinal fd (and try
> to fallback to sysctl if it's absent/not working)?

both of these ideas were discussed at lsfmm where you were present.
I'm not sure why you're bring it up again?
Stanislav Fomichev June 25, 2019, 9:19 p.m. UTC | #3
On 06/25, Alexei Starovoitov wrote:
> On 6/25/19 1:51 PM, Stanislav Fomichev wrote:
> > On 06/25, Song Liu wrote:
> >> Currently, most access to sys_bpf() is limited to root. However, there are
> >> use cases that would benefit from non-privileged use of sys_bpf(), e.g.
> >> systemd.
> >>
> >> This set introduces a new model to control the access to sys_bpf(). A
> >> special device, /dev/bpf, is introduced to manage access to sys_bpf().
> >> Users with access to open /dev/bpf will be able to access most of
> >> sys_bpf() features. The use can get access to sys_bpf() by opening /dev/bpf
> >> and use ioctl to get/put permission.
> >>
> >> The permission to access sys_bpf() is marked by bit TASK_BPF_FLAG_PERMITTED
> >> in task_struct. During fork(), child will not inherit this bit.
> > 2c: if we are going to have an fd, I'd vote for a proper fd based access
> > checks instead of a per-task flag, so we can do:
> > 	ioctl(fd, BPF_MAP_CREATE, uattr, sizeof(uattr))
> > 
> > (and pass this fd around)
> > 
> > I do understand that it breaks current assumptions that libbpf has,
> > but maybe we can extend _xattr variants to accept optinal fd (and try
> > to fallback to sysctl if it's absent/not working)?
> 
> both of these ideas were discussed at lsfmm where you were present.
> I'm not sure why you're bring it up again?
Did we actually settle on anything? In that case feel free to ignore me,
maybe I missed that. I remember there were pros/cons for both implementations.
Alexei Starovoitov June 25, 2019, 10:47 p.m. UTC | #4
On 6/25/19 2:19 PM, Stanislav Fomichev wrote:
> On 06/25, Alexei Starovoitov wrote:
>> On 6/25/19 1:51 PM, Stanislav Fomichev wrote:
>>> On 06/25, Song Liu wrote:
>>>> Currently, most access to sys_bpf() is limited to root. However, there are
>>>> use cases that would benefit from non-privileged use of sys_bpf(), e.g.
>>>> systemd.
>>>>
>>>> This set introduces a new model to control the access to sys_bpf(). A
>>>> special device, /dev/bpf, is introduced to manage access to sys_bpf().
>>>> Users with access to open /dev/bpf will be able to access most of
>>>> sys_bpf() features. The use can get access to sys_bpf() by opening /dev/bpf
>>>> and use ioctl to get/put permission.
>>>>
>>>> The permission to access sys_bpf() is marked by bit TASK_BPF_FLAG_PERMITTED
>>>> in task_struct. During fork(), child will not inherit this bit.
>>> 2c: if we are going to have an fd, I'd vote for a proper fd based access
>>> checks instead of a per-task flag, so we can do:
>>> 	ioctl(fd, BPF_MAP_CREATE, uattr, sizeof(uattr))
>>>
>>> (and pass this fd around)
>>>
>>> I do understand that it breaks current assumptions that libbpf has,
>>> but maybe we can extend _xattr variants to accept optinal fd (and try
>>> to fallback to sysctl if it's absent/not working)?
>>
>> both of these ideas were discussed at lsfmm where you were present.
>> I'm not sure why you're bring it up again?
> Did we actually settle on anything? In that case feel free to ignore me,
> maybe I missed that. I remember there were pros/cons for both implementations.

yes. That was my understanding from lsfmm.
Which was:
1. replicating all commands via ioctl is not going to work.
   Also ioctl cannot return fd.
2. adding fd to all structs inside bpf_attr is a big churn on uapi.
   all future structs would need to have this extra fd as well.
   I don't like that kind of crutch to be carried over and over again.

The only thing we can consider instead of ioctl is to add single
new command for bpf syscall that will take that fd and apply
the attribute to task struct.
ioctl on that fd or new command look equivalent to me.