Message ID | 20190829051253.1927291-1-ast@kernel.org |
---|---|
State | Changes Requested |
Delegated to: | BPF Maintainers |
Headers | show |
Series | [v2,bpf-next,1/3] capability: introduce CAP_BPF and CAP_TRACING | expand |
> On Aug 28, 2019, at 10:12 PM, Alexei Starovoitov <ast@kernel.org> wrote: > [...] > - Creation of [ku][ret]probe > - Accessing arbitrary kernel memory via kprobe + probe_kernel_read > - Attach tracing bpf programs to perf events > - Access to kallsyms > > Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com>
Alexei Starovoitov <ast@kernel.org> writes: > CAP_BPF allows the following BPF operations: > - Loading all types of BPF programs > - Creating all types of BPF maps except: > - stackmap that needs CAP_TRACING > - devmap that needs CAP_NET_ADMIN > - cpumap that needs CAP_SYS_ADMIN Why CAP_SYS_ADMIN instead of CAP_NET_ADMIN for cpumap? -Toke
Le 29/08/2019 à 07:12, Alexei Starovoitov a écrit : [snip] > CAP_BPF and CAP_NET_ADMIN together allow the following: > - Attach to cgroup-bpf hooks and query > - skb, xdp, flow_dissector test_run command > > CAP_NET_ADMIN allows: > - Attach networking bpf programs to xdp, tc, lwt, flow dissector I'm not sure to understand the difference between these last two points. But, with the current kernel, CAP_NET_ADMIN is not enough to attach bpf prog with tc and it's still not enough after your patch. The following command is rejected: $ tc filter add dev eth0 ingress matchall action bpf obj ./tc_test_kern.o sec test Prog section 'test' rejected: Operation not permitted (1)! - Type: 4 - Instructions: 22 (0 over limit) - License: GPL Verifier analysis: Error fetching program/map! bad action parsing parse_action: bad value (5:bpf)! Illegal "action" $ Like Andy, I'm also wondering about the backward compatibility. With my current docker, I'm able to play with tc bpf with CAP_SYS_ADMIN. But if I update my kernel with your patches, CAP_SYS_ADMIN doesn't allow anymore that and CAP_BPF is not implemented in my current docker, thus I cannot give the correct capabilities. In other words, an old docker cannot run on a new kernel. Regards, Nicolas
On 8/29/19 7:12 AM, Alexei Starovoitov wrote: [...] > > +/* > + * CAP_BPF allows the following BPF operations: > + * - Loading all types of BPF programs > + * - Creating all types of BPF maps except: > + * - stackmap that needs CAP_TRACING > + * - devmap that needs CAP_NET_ADMIN > + * - cpumap that needs CAP_SYS_ADMIN > + * - Advanced verifier features > + * - Indirect variable access > + * - Bounded loops > + * - BPF to BPF function calls > + * - Scalar precision tracking > + * - Larger complexity limits > + * - Dead code elimination > + * - And potentially other features > + * - Use of pointer-to-integer conversions in BPF programs > + * - Bypassing of speculation attack hardening measures > + * - Loading BPF Type Format (BTF) data > + * - Iterate system wide loaded programs, maps, BTF objects > + * - Retrieve xlated and JITed code of BPF programs > + * - Access maps and programs via id > + * - Use bpf_spin_lock() helper This is still very wide. Consider following example: app has CAP_BPF + CAP_NET_ADMIN. Why can't we in this case *only* allow loading networking related [plus generic] maps and programs? If it doesn't have CAP_TRACING, what would be a reason to allow loading it? Same vice versa. There are some misc program types like the infraread stuff, but they could continue to live under [CAP_BPF +] CAP_SYS_ADMIN as fallback. I think categorizing a specific list of prog and map types might be more clear than disallowing some helpers like below (e.g. why choice of bpf_probe_read() but not bpf_probe_write_user() etc). > + * CAP_BPF and CAP_TRACING together allow the following: > + * - bpf_probe_read to read arbitrary kernel memory > + * - bpf_trace_printk to print data to ftrace ring buffer > + * - Attach to raw_tracepoint > + * - Query association between kprobe/tracepoint and bpf program > + * > + * CAP_BPF and CAP_NET_ADMIN together allow the following: > + * - Attach to cgroup-bpf hooks and query > + * - skb, xdp, flow_dissector test_run command > + * > + * CAP_NET_ADMIN allows: > + * - Attach networking bpf programs to xdp, tc, lwt, flow dissector > + */ > +#define CAP_BPF 38 > + > +/* > + * CAP_TRACING allows: > + * - Full use of perf_event_open(), similarly to the effect of > + * kernel.perf_event_paranoid == -1 > + * - Full use of tracefs > + * - Creation of [ku][ret]probe > + * - Accessing arbitrary kernel memory via kprobe + probe_kernel_read > + * - Attach tracing bpf programs to perf events > + * - Access to kallsyms > + */ > +#define CAP_TRACING 39 > > -#define CAP_LAST_CAP CAP_AUDIT_READ > +#define CAP_LAST_CAP CAP_TRACING > > #define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP) > > diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h > index 201f7e588a29..0b364e245163 100644 > --- a/security/selinux/include/classmap.h > +++ b/security/selinux/include/classmap.h > @@ -26,9 +26,9 @@ > "audit_control", "setfcap" > > #define COMMON_CAP2_PERMS "mac_override", "mac_admin", "syslog", \ > - "wake_alarm", "block_suspend", "audit_read" > + "wake_alarm", "block_suspend", "audit_read", "bpf", "tracing" > > -#if CAP_LAST_CAP > CAP_AUDIT_READ > +#if CAP_LAST_CAP > CAP_TRACING > #error New capability defined, please update COMMON_CAP2_PERMS. > #endif > >
> On Aug 29, 2019, at 8:47 AM, Daniel Borkmann <daniel@iogearbox.net> wrote: > >> On 8/29/19 7:12 AM, Alexei Starovoitov wrote: >> [...] >> +/* >> + * CAP_BPF allows the following BPF operations: >> + * - Loading all types of BPF programs >> + * - Creating all types of BPF maps except: >> + * - stackmap that needs CAP_TRACING >> + * - devmap that needs CAP_NET_ADMIN >> + * - cpumap that needs CAP_SYS_ADMIN >> + * - Advanced verifier features >> + * - Indirect variable access >> + * - Bounded loops >> + * - BPF to BPF function calls >> + * - Scalar precision tracking >> + * - Larger complexity limits >> + * - Dead code elimination >> + * - And potentially other features >> + * - Use of pointer-to-integer conversions in BPF programs >> + * - Bypassing of speculation attack hardening measures >> + * - Loading BPF Type Format (BTF) data >> + * - Iterate system wide loaded programs, maps, BTF objects >> + * - Retrieve xlated and JITed code of BPF programs >> + * - Access maps and programs via id >> + * - Use bpf_spin_lock() helper > > This is still very wide. Consider following example: app has CAP_BPF + > CAP_NET_ADMIN. Why can't we in this case *only* allow loading networking > related [plus generic] maps and programs? If it doesn't have CAP_TRACING, > what would be a reason to allow loading it? Same vice versa. There are > some misc program types like the infraread stuff, but they could continue > to live under [CAP_BPF +] CAP_SYS_ADMIN as fallback. I think categorizing > a specific list of prog and map types might be more clear than disallowing > some helpers like below (e.g. why choice of bpf_probe_read() but not > bpf_probe_write_user() etc). Wow, I didn’t notice that bpf_probe_write_user() existed. That should need something like CAP_PTRACE or CAP_SYS_ADMIN. I'm starting to think that something like this: https://lore.kernel.org/bpf/968f3551247a43e1104b198f2e58fb0595d425e7.1565040372.git.luto@kernel.org/ should maybe be finished before CAP_BPF happens at all. It really looks like the bpf operations that need privilege need to get fully catalogued and dealt with rather than just coming up with a new capability that covers a huge swath. (bpf_probe_write_user() is also terminally broken on architectures like s390x, but that's not really relevant right now. I'm a bit surprised it works on x86 with SMAP, though.)
On Thu, Aug 29, 2019 at 09:44:18AM +0200, Toke Høiland-Jørgensen wrote: > Alexei Starovoitov <ast@kernel.org> writes: > > > CAP_BPF allows the following BPF operations: > > - Loading all types of BPF programs > > - Creating all types of BPF maps except: > > - stackmap that needs CAP_TRACING > > - devmap that needs CAP_NET_ADMIN > > - cpumap that needs CAP_SYS_ADMIN > > Why CAP_SYS_ADMIN instead of CAP_NET_ADMIN for cpumap? Currently it's cap_sys_admin and I think it should stay this way because it creates kthreads.
On Thu, Aug 29, 2019 at 03:36:42PM +0200, Nicolas Dichtel wrote: > Le 29/08/2019 à 07:12, Alexei Starovoitov a écrit : > [snip] > > CAP_BPF and CAP_NET_ADMIN together allow the following: > > - Attach to cgroup-bpf hooks and query > > - skb, xdp, flow_dissector test_run command > > > > CAP_NET_ADMIN allows: > > - Attach networking bpf programs to xdp, tc, lwt, flow dissector > I'm not sure to understand the difference between these last two points. > But, with the current kernel, CAP_NET_ADMIN is not enough to attach bpf prog > with tc and it's still not enough after your patch. > The following command is rejected: > $ tc filter add dev eth0 ingress matchall action bpf obj ./tc_test_kern.o sec test > > Prog section 'test' rejected: Operation not permitted (1)! > - Type: 4 > - Instructions: 22 (0 over limit) > - License: GPL > > Verifier analysis: > > Error fetching program/map! > bad action parsing > parse_action: bad value (5:bpf)! > Illegal "action" because tc/iproute2 is doing load and attach. Currently load needs cap_sys_admin and attach needs cap_net_admin.
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: > On Thu, Aug 29, 2019 at 09:44:18AM +0200, Toke Høiland-Jørgensen wrote: >> Alexei Starovoitov <ast@kernel.org> writes: >> >> > CAP_BPF allows the following BPF operations: >> > - Loading all types of BPF programs >> > - Creating all types of BPF maps except: >> > - stackmap that needs CAP_TRACING >> > - devmap that needs CAP_NET_ADMIN >> > - cpumap that needs CAP_SYS_ADMIN >> >> Why CAP_SYS_ADMIN instead of CAP_NET_ADMIN for cpumap? > > Currently it's cap_sys_admin and I think it should stay this way > because it creates kthreads. Ah, right. I can sorta see that makes sense because of the kthreads, but it also means that you can use all of XDP *except* cpumap with CAP_NET_ADMIN+CAP_BPF. That is bound to create confusion, isn't it? -Toke
On Thu, 29 Aug 2019 20:05:49 +0200 Toke Høiland-Jørgensen <toke@redhat.com> wrote: > Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: > > > On Thu, Aug 29, 2019 at 09:44:18AM +0200, Toke Høiland-Jørgensen wrote: > >> Alexei Starovoitov <ast@kernel.org> writes: > >> > >> > CAP_BPF allows the following BPF operations: > >> > - Loading all types of BPF programs > >> > - Creating all types of BPF maps except: > >> > - stackmap that needs CAP_TRACING > >> > - devmap that needs CAP_NET_ADMIN > >> > - cpumap that needs CAP_SYS_ADMIN > >> > >> Why CAP_SYS_ADMIN instead of CAP_NET_ADMIN for cpumap? > > > > Currently it's cap_sys_admin and I think it should stay this way > > because it creates kthreads. > > Ah, right. I can sorta see that makes sense because of the kthreads, but > it also means that you can use all of XDP *except* cpumap with > CAP_NET_ADMIN+CAP_BPF. That is bound to create confusion, isn't it? Hmm... I see 'cpumap' primarily as a network stack feature. It is about starting the network stack on a specific CPU, allocating and building SKBs on that remote CPU. It can only be used together with XDP_REDIRECT. I would prefer CAP_NET_ADMIN like the devmap, to keep the XDP capabilities consistent.
On Thu, Aug 29, 2019 at 10:25:30PM +0200, Jesper Dangaard Brouer wrote: > On Thu, 29 Aug 2019 20:05:49 +0200 > Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > > Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: > > > > > On Thu, Aug 29, 2019 at 09:44:18AM +0200, Toke Høiland-Jørgensen wrote: > > >> Alexei Starovoitov <ast@kernel.org> writes: > > >> > > >> > CAP_BPF allows the following BPF operations: > > >> > - Loading all types of BPF programs > > >> > - Creating all types of BPF maps except: > > >> > - stackmap that needs CAP_TRACING > > >> > - devmap that needs CAP_NET_ADMIN > > >> > - cpumap that needs CAP_SYS_ADMIN > > >> > > >> Why CAP_SYS_ADMIN instead of CAP_NET_ADMIN for cpumap? > > > > > > Currently it's cap_sys_admin and I think it should stay this way > > > because it creates kthreads. > > > > Ah, right. I can sorta see that makes sense because of the kthreads, but > > it also means that you can use all of XDP *except* cpumap with > > CAP_NET_ADMIN+CAP_BPF. That is bound to create confusion, isn't it? > > Hmm... I see 'cpumap' primarily as a network stack feature. It is about > starting the network stack on a specific CPU, allocating and building > SKBs on that remote CPU. It can only be used together with XDP_REDIRECT. > I would prefer CAP_NET_ADMIN like the devmap, to keep the XDP > capabilities consistent. I don't mind relaxing cpumap to cap_net_admin. Looking at the reaction to the rest of the set. I'd rather discuss it and do it later after basic cap_bpf is in.
On Thu, Aug 29, 2019 at 8:47 AM Daniel Borkmann <daniel@iogearbox.net> wrote: > > On 8/29/19 7:12 AM, Alexei Starovoitov wrote: > [...] > > > > +/* > > + * CAP_BPF allows the following BPF operations: > > + * - Loading all types of BPF programs > > + * - Creating all types of BPF maps except: > > + * - stackmap that needs CAP_TRACING > > + * - devmap that needs CAP_NET_ADMIN > > + * - cpumap that needs CAP_SYS_ADMIN > > + * - Advanced verifier features > > + * - Indirect variable access > > + * - Bounded loops > > + * - BPF to BPF function calls > > + * - Scalar precision tracking > > + * - Larger complexity limits > > + * - Dead code elimination > > + * - And potentially other features > > + * - Use of pointer-to-integer conversions in BPF programs > > + * - Bypassing of speculation attack hardening measures > > + * - Loading BPF Type Format (BTF) data > > + * - Iterate system wide loaded programs, maps, BTF objects > > + * - Retrieve xlated and JITed code of BPF programs > > + * - Access maps and programs via id > > + * - Use bpf_spin_lock() helper > > This is still very wide. 'still very wide' ? you make it sound like it's a bad thing. Covering all of bpf with single CAP_BPF is #1 goal of this set. > Consider following example: app has CAP_BPF +> CAP_NET_ADMIN. Why can't we in this case *only* allow loading networking > related [plus generic] maps and programs? If it doesn't have CAP_TRACING, > what would be a reason to allow loading it? Same vice versa. There are > some misc program types like the infraread stuff, but they could continue > to live under [CAP_BPF +] CAP_SYS_ADMIN as fallback. I think categorizing > a specific list of prog and map types might be more clear than disallowing > some helpers like below (e.g. why choice of bpf_probe_read() but not > bpf_probe_write_user() etc). It kinda makes sense: cap_bpf+cap_net_admin allows networking progs. cap_bpf+cap_tracing allows tracing progs. But what to do with cg_sysctl, cg_device, lirc ? They are clearly neither. Invent yet another cap_foo for them? or let them under cap_bpf alone? If cap_bpf alone is enough then why bother with bpf+net_admin for networking? It's not making anything cleaner. Only confuses users. Also bpf_trace_printk() is using ftrace and can print arbitrary memory. In that sense it's no different than bpf_probe_read. Both should be under CAP_TRACING. But bpf_trace_printk() is available to all progs. Even to socket filters under cap_sys_admin today. With this patch set I'm allowing bpf_trace_printk() under CAP_TRACING. Same goes to bpf_probe_read. High level description: cap_bpf alone allows loading of all progs except when later cap_net_admin or cap_tracing will _not_ be able to filter out the helper at attach time that shouldn't be there. Example of how this patch set works: - to load and attach networking progs both cap_bpf and cap_net_admin are necessary. - to load and attach tracing progs both cap_bpf and cap_tracing are necessary. when networking prog is using bpf_trace_printk then cap_tracing is needed too. And it's checked at load time. If we do what you're proposing: "lets allow load of all networking with bpf+net_admin" then this won't work for bpf_trace_printk. Per helper function capability check is still needed. And since it's needed I see no reason to limit networking progs to bpf+net_admin at load time. Load time is still cap_bpf only. And helpers will be filtered out at attach by net_admin.
diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h index 240fdb9a60f6..664e07d12888 100644 --- a/include/uapi/linux/capability.h +++ b/include/uapi/linux/capability.h @@ -366,8 +366,57 @@ struct vfs_ns_cap_data { #define CAP_AUDIT_READ 37 +/* + * CAP_BPF allows the following BPF operations: + * - Loading all types of BPF programs + * - Creating all types of BPF maps except: + * - stackmap that needs CAP_TRACING + * - devmap that needs CAP_NET_ADMIN + * - cpumap that needs CAP_SYS_ADMIN + * - Advanced verifier features + * - Indirect variable access + * - Bounded loops + * - BPF to BPF function calls + * - Scalar precision tracking + * - Larger complexity limits + * - Dead code elimination + * - And potentially other features + * - Use of pointer-to-integer conversions in BPF programs + * - Bypassing of speculation attack hardening measures + * - Loading BPF Type Format (BTF) data + * - Iterate system wide loaded programs, maps, BTF objects + * - Retrieve xlated and JITed code of BPF programs + * - Access maps and programs via id + * - Use bpf_spin_lock() helper + * + * CAP_BPF and CAP_TRACING together allow the following: + * - bpf_probe_read to read arbitrary kernel memory + * - bpf_trace_printk to print data to ftrace ring buffer + * - Attach to raw_tracepoint + * - Query association between kprobe/tracepoint and bpf program + * + * CAP_BPF and CAP_NET_ADMIN together allow the following: + * - Attach to cgroup-bpf hooks and query + * - skb, xdp, flow_dissector test_run command + * + * CAP_NET_ADMIN allows: + * - Attach networking bpf programs to xdp, tc, lwt, flow dissector + */ +#define CAP_BPF 38 + +/* + * CAP_TRACING allows: + * - Full use of perf_event_open(), similarly to the effect of + * kernel.perf_event_paranoid == -1 + * - Full use of tracefs + * - Creation of [ku][ret]probe + * - Accessing arbitrary kernel memory via kprobe + probe_kernel_read + * - Attach tracing bpf programs to perf events + * - Access to kallsyms + */ +#define CAP_TRACING 39 -#define CAP_LAST_CAP CAP_AUDIT_READ +#define CAP_LAST_CAP CAP_TRACING #define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP) diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h index 201f7e588a29..0b364e245163 100644 --- a/security/selinux/include/classmap.h +++ b/security/selinux/include/classmap.h @@ -26,9 +26,9 @@ "audit_control", "setfcap" #define COMMON_CAP2_PERMS "mac_override", "mac_admin", "syslog", \ - "wake_alarm", "block_suspend", "audit_read" + "wake_alarm", "block_suspend", "audit_read", "bpf", "tracing" -#if CAP_LAST_CAP > CAP_AUDIT_READ +#if CAP_LAST_CAP > CAP_TRACING #error New capability defined, please update COMMON_CAP2_PERMS. #endif
CAP_BPF allows the following BPF operations: - Loading all types of BPF programs - Creating all types of BPF maps except: - stackmap that needs CAP_TRACING - devmap that needs CAP_NET_ADMIN - cpumap that needs CAP_SYS_ADMIN - Advanced verifier features - Indirect variable access - Bounded loops - BPF to BPF function calls - Scalar precision tracking - Larger complexity limits - Dead code elimination - And potentially other features - Use of pointer-to-integer conversions in BPF programs - Bypassing of speculation attack hardening measures - Loading BPF Type Format (BTF) data - Iterate system wide loaded programs, maps, BTF objects - Retrieve xlated and JITed code of BPF programs - Access maps and programs via id - Use bpf_spin_lock() helper CAP_BPF and CAP_TRACING together allow the following: - bpf_probe_read to read arbitrary kernel memory - bpf_trace_printk to print data to ftrace ring buffer - Attach to raw_tracepoint - Query association between kprobe/tracepoint and bpf program CAP_BPF and CAP_NET_ADMIN together allow the following: - Attach to cgroup-bpf hooks and query - skb, xdp, flow_dissector test_run command CAP_NET_ADMIN allows: - Attach networking bpf programs to xdp, tc, lwt, flow dissector CAP_TRACING allows: - Full use of perf_event_open(), similarly to the effect of kernel.perf_event_paranoid == -1 - Full use of tracefs - Creation of [ku][ret]probe - Accessing arbitrary kernel memory via kprobe + probe_kernel_read - Attach tracing bpf programs to perf events - Access to kallsyms Signed-off-by: Alexei Starovoitov <ast@kernel.org> --- include/uapi/linux/capability.h | 51 ++++++++++++++++++++++++++++- security/selinux/include/classmap.h | 4 +-- 2 files changed, 52 insertions(+), 3 deletions(-)