diff mbox series

[23/27] bpf: Restrict kernel image access functions when the kernel is locked down

Message ID 20190325220954.29054-24-matthewgarrett@google.com
State Changes Requested
Delegated to: BPF Maintainers
Headers show
Series None | expand

Commit Message

Matthew Garrett March 25, 2019, 10:09 p.m. UTC
From: David Howells <dhowells@redhat.com>

There are some bpf functions can be used to read kernel memory:
bpf_probe_read, bpf_probe_write_user and bpf_trace_printk.  These allow
private keys in kernel memory (e.g. the hibernation image signing key) to
be read by an eBPF program and kernel memory to be altered without
restriction.

Completely prohibit the use of BPF when the kernel is locked down.

Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: netdev@vger.kernel.org
cc: Chun-Yi Lee <jlee@suse.com>
cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Matthew Garrett <matthewgarrett@google.com>
---
 kernel/bpf/syscall.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Stephen Hemminger March 25, 2019, 11:42 p.m. UTC | #1
On Mon, 25 Mar 2019 15:09:50 -0700
Matthew Garrett <matthewgarrett@google.com> wrote:

> From: David Howells <dhowells@redhat.com>
> 
> There are some bpf functions can be used to read kernel memory:
> bpf_probe_read, bpf_probe_write_user and bpf_trace_printk.  These allow
> private keys in kernel memory (e.g. the hibernation image signing key) to
> be read by an eBPF program and kernel memory to be altered without
> restriction.
> 
> Completely prohibit the use of BPF when the kernel is locked down.
> 
> Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: netdev@vger.kernel.org
> cc: Chun-Yi Lee <jlee@suse.com>
> cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: Matthew Garrett <matthewgarrett@google.com>

Wouldn't this mean that Seccomp won't work in locked down mode?
Stephen Hemminger March 25, 2019, 11:59 p.m. UTC | #2
On Mon, 25 Mar 2019 16:42:21 -0700
Stephen Hemminger <stephen@networkplumber.org> wrote:

> On Mon, 25 Mar 2019 15:09:50 -0700
> Matthew Garrett <matthewgarrett@google.com> wrote:
> 
> > From: David Howells <dhowells@redhat.com>
> > 
> > There are some bpf functions can be used to read kernel memory:
> > bpf_probe_read, bpf_probe_write_user and bpf_trace_printk.  These allow
> > private keys in kernel memory (e.g. the hibernation image signing key) to
> > be read by an eBPF program and kernel memory to be altered without
> > restriction.
> > 
> > Completely prohibit the use of BPF when the kernel is locked down.
> > 
> > Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> > Signed-off-by: David Howells <dhowells@redhat.com>
> > cc: netdev@vger.kernel.org
> > cc: Chun-Yi Lee <jlee@suse.com>
> > cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > Signed-off-by: Matthew Garrett <matthewgarrett@google.com>  
> 
> Wouldn't this mean that Seccomp won't work in locked down mode?

Never mind. This is about bpf system call, not locking out all bpf in general.
Daniel Borkmann March 26, 2019, midnight UTC | #3
On 03/26/2019 12:42 AM, Stephen Hemminger wrote:
> On Mon, 25 Mar 2019 15:09:50 -0700
> Matthew Garrett <matthewgarrett@google.com> wrote:
> 
>> From: David Howells <dhowells@redhat.com>
>>
>> There are some bpf functions can be used to read kernel memory:
>> bpf_probe_read, bpf_probe_write_user and bpf_trace_printk.  These allow
>> private keys in kernel memory (e.g. the hibernation image signing key) to
>> be read by an eBPF program and kernel memory to be altered without
>> restriction.

I'm not sure where 'kernel memory to be altered without restriction' comes
from, but it's definitely a wrong statement.

>> Completely prohibit the use of BPF when the kernel is locked down.

In which scenarios will the lock-down mode be used? Mostly niche? I'm asking
as this would otherwise break a lot of existing stuff ... I'd prefer you find
a better solution to this than this straight -EPERM rejection.

>> Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
>> Signed-off-by: David Howells <dhowells@redhat.com>
>> cc: netdev@vger.kernel.org
>> cc: Chun-Yi Lee <jlee@suse.com>
>> cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
>> Cc: Daniel Borkmann <daniel@iogearbox.net>
>> Signed-off-by: Matthew Garrett <matthewgarrett@google.com>
> 
> Wouldn't this mean that Seccomp won't work in locked down mode?
>
Andy Lutomirski March 26, 2019, 12:10 a.m. UTC | #4
On Mon, Mar 25, 2019 at 4:42 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Mon, 25 Mar 2019 15:09:50 -0700
> Matthew Garrett <matthewgarrett@google.com> wrote:
>
> > From: David Howells <dhowells@redhat.com>
> >
> > There are some bpf functions can be used to read kernel memory:
> > bpf_probe_read, bpf_probe_write_user and bpf_trace_printk.  These allow
> > private keys in kernel memory (e.g. the hibernation image signing key) to
> > be read by an eBPF program and kernel memory to be altered without
> > restriction.
> >
> > Completely prohibit the use of BPF when the kernel is locked down.
> >
> > Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> > Signed-off-by: David Howells <dhowells@redhat.com>
> > cc: netdev@vger.kernel.org
> > cc: Chun-Yi Lee <jlee@suse.com>
> > cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > Signed-off-by: Matthew Garrett <matthewgarrett@google.com>
>
> Wouldn't this mean that Seccomp won't work in locked down mode?

I wasn't cc'd on this series, nor was linux-api, so it's awkward to review.

A while back, I suggested an approach to actually make this stuff
mergeable: submit a patch series that adds lockdown mode, enables it
by command line option (and maybe sysctl) *only* and has either no
effect or only a token effect.  Then we can add actual features to
lockdown mode one at a time and review them separately.

And I'm going to complain loudly unless two things change about this
whole thing:

1. Lockdown mode becomes three states, not a boolean.  The states are:
no lockdown, best-effort-to-protect-kernel-integrity, and
best-effort-to-protect-kernel-secrecy-and-integrity.  And this BPF
mess illustrates why: most users will really strongly object to
turning off BPF when they actually just want to protect kernel
integrity.  And as far as I know, things like Secure Boot policy will
mostly care about integrity, not secrecy, and tracing and such should
work on a normal locked-down kernel.  So I think we need this knob.

2. All the proponents of this series, and the documentation, needs to
document that it's best effort.  There will always be security bugs,
and there will always be things we miss.


--Andy
Jordan Glover March 26, 2019, 1:54 p.m. UTC | #5
On Tuesday, March 26, 2019 12:00 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:

> On 03/26/2019 12:42 AM, Stephen Hemminger wrote:
>
> > On Mon, 25 Mar 2019 15:09:50 -0700
> > Matthew Garrett matthewgarrett@google.com wrote:
> >
> > > From: David Howells dhowells@redhat.com
> > > There are some bpf functions can be used to read kernel memory:
> > > bpf_probe_read, bpf_probe_write_user and bpf_trace_printk. These allow
> > > private keys in kernel memory (e.g. the hibernation image signing key) to
> > > be read by an eBPF program and kernel memory to be altered without
> > > restriction.
>
> I'm not sure where 'kernel memory to be altered without restriction' comes
> from, but it's definitely a wrong statement.
>
> > > Completely prohibit the use of BPF when the kernel is locked down.
>
> In which scenarios will the lock-down mode be used? Mostly niche? I'm asking
> as this would otherwise break a lot of existing stuff ... I'd prefer you find
> a better solution to this than this straight -EPERM rejection.


AFAIK this change breaks IPAddressAllow/IPAddressDeny usage in systemd services
which makes them LESS secure.

https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html
https://github.com/systemd/systemd/blob/04d7ca022843913fba5170c40be07acf2ab5902b/README#L96

Jordan
James Morris March 26, 2019, 6:57 p.m. UTC | #6
On Mon, 25 Mar 2019, Andy Lutomirski wrote:

> A while back, I suggested an approach to actually make this stuff
> mergeable: submit a patch series that adds lockdown mode, enables it
> by command line option (and maybe sysctl) *only* and has either no
> effect or only a token effect.  Then we can add actual features to
> lockdown mode one at a time and review them separately.

This makes sense to me.

> 
> And I'm going to complain loudly unless two things change about this
> whole thing:
> 
> 1. Lockdown mode becomes three states, not a boolean.  The states are:
> no lockdown, best-effort-to-protect-kernel-integrity, and
> best-effort-to-protect-kernel-secrecy-and-integrity.  And this BPF
> mess illustrates why: most users will really strongly object to
> turning off BPF when they actually just want to protect kernel
> integrity.  And as far as I know, things like Secure Boot policy will
> mostly care about integrity, not secrecy, and tracing and such should
> work on a normal locked-down kernel.  So I think we need this knob.

Another approach would be to make this entirely policy based:

- Assign an ID to each lockdown point
- Implement a policy mechanism where each ID is mapped to 0 or 1
- Allow this policy to be specified statically or dynamically

So, 

	kernel_is_locked_down("ioperm")

becomes

	kernel_is_locked_down(LOCKDOWN_IOPERM)

and this function checks e.g.

	if (lockdown_polcy[id]) {
		fail or warn;
        }

Thoughts?


> 2. All the proponents of this series, and the documentation, needs to
> document that it's best effort.  There will always be security bugs,
> and there will always be things we miss.

Right.  Maintaining this feature will be an ongoing effort, and if its not 
actively maintained, it will bitrot and become useless.
Andy Lutomirski March 26, 2019, 7:22 p.m. UTC | #7
On Tue, Mar 26, 2019 at 11:57 AM James Morris <jmorris@namei.org> wrote:
>
> On Mon, 25 Mar 2019, Andy Lutomirski wrote:
>
> > A while back, I suggested an approach to actually make this stuff
> > mergeable: submit a patch series that adds lockdown mode, enables it
> > by command line option (and maybe sysctl) *only* and has either no
> > effect or only a token effect.  Then we can add actual features to
> > lockdown mode one at a time and review them separately.
>
> This makes sense to me.
>
> >
> > And I'm going to complain loudly unless two things change about this
> > whole thing:
> >
> > 1. Lockdown mode becomes three states, not a boolean.  The states are:
> > no lockdown, best-effort-to-protect-kernel-integrity, and
> > best-effort-to-protect-kernel-secrecy-and-integrity.  And this BPF
> > mess illustrates why: most users will really strongly object to
> > turning off BPF when they actually just want to protect kernel
> > integrity.  And as far as I know, things like Secure Boot policy will
> > mostly care about integrity, not secrecy, and tracing and such should
> > work on a normal locked-down kernel.  So I think we need this knob.
>
> Another approach would be to make this entirely policy based:
>
> - Assign an ID to each lockdown point
> - Implement a policy mechanism where each ID is mapped to 0 or 1
> - Allow this policy to be specified statically or dynamically
>
> So,
>
>         kernel_is_locked_down("ioperm")
>
> becomes
>
>         kernel_is_locked_down(LOCKDOWN_IOPERM)
>
> and this function checks e.g.
>
>         if (lockdown_polcy[id]) {
>                 fail or warn;
>         }
>
> Thoughts?

I'm concerned that this gives too much useless flexibility to
administrators and user code in general.  If you can break kernel
integrity, you can break kernel integrity -- it shouldn't really
matter *how* you break it.

--Andy
Matthew Garrett March 26, 2019, 8:19 p.m. UTC | #8
On Tue, Mar 26, 2019 at 11:57 AM James Morris <jmorris@namei.org> wrote:
> - Assign an ID to each lockdown point
> - Implement a policy mechanism where each ID is mapped to 0 or 1
> - Allow this policy to be specified statically or dynamically

One of the problems with this approach is what the default behaviour
should be when a new feature is added. If an admin fails to notice
that there's now a new policy element, they run the risk of kernel
integrity being compromised via the new feature even if the rest of
the kernel is locked down.
James Morris March 28, 2019, 3:15 a.m. UTC | #9
On Tue, 26 Mar 2019, Andy Lutomirski wrote:

> >
> >         kernel_is_locked_down("ioperm")
> >
> > becomes
> >
> >         kernel_is_locked_down(LOCKDOWN_IOPERM)
> >
> > and this function checks e.g.
> >
> >         if (lockdown_polcy[id]) {
> >                 fail or warn;
> >         }
> >
> > Thoughts?
> 
> I'm concerned that this gives too much useless flexibility to
> administrators and user code in general.  If you can break kernel
> integrity, you can break kernel integrity -- it shouldn't really
> matter *how* you break it.

OTOH, this seems like a combination of mechanism and policy. The 3 modes 
are a help here, but I wonder if they may be too coarse grained still, 
e.g. if someone wants to allow a specific mechanism according to their own 
threat model and mitigations.

Secure boot gives you some assurance of the static state of the system at 
boot time, and lockdown is certainly useful (with or without secure boot), 
but it's not a complete solution to runtime kernel integrity protection by 
any stretch of the imagination.  I'm concerned about it being perceived as 
such.

I'm not sure how to think about it architecturally and how it fits as such 
in the mainline kernel.
Matthew Garrett March 28, 2019, 6:07 p.m. UTC | #10
On Wed, Mar 27, 2019 at 8:15 PM James Morris <jmorris@namei.org> wrote:
> OTOH, this seems like a combination of mechanism and policy. The 3 modes
> are a help here, but I wonder if they may be too coarse grained still,
> e.g. if someone wants to allow a specific mechanism according to their own
> threat model and mitigations.

In general the interfaces blocked by these patches could also be
blocked with an LSM, and I'd guess that people with more fine-grained
requirements would probably take that approach.

> Secure boot gives you some assurance of the static state of the system at
> boot time, and lockdown is certainly useful (with or without secure boot),
> but it's not a complete solution to runtime kernel integrity protection by
> any stretch of the imagination.  I'm concerned about it being perceived as
> such.

What do you think the functionality gaps are in terms of ensuring
kernel integrity (other than kernel flaws that allow the restrictions
to be bypassed)?
James Morris March 28, 2019, 7:23 p.m. UTC | #11
On Thu, 28 Mar 2019, Matthew Garrett wrote:

> On Wed, Mar 27, 2019 at 8:15 PM James Morris <jmorris@namei.org> wrote:
> > OTOH, this seems like a combination of mechanism and policy. The 3 modes
> > are a help here, but I wonder if they may be too coarse grained still,
> > e.g. if someone wants to allow a specific mechanism according to their own
> > threat model and mitigations.
> 
> In general the interfaces blocked by these patches could also be
> blocked with an LSM, and I'd guess that people with more fine-grained
> requirements would probably take that approach.

So... I have to ask, why not use LSM for this in the first place?

Either with an existing module or perhaps a lockdown LSM?

> 
> > Secure boot gives you some assurance of the static state of the system at
> > boot time, and lockdown is certainly useful (with or without secure boot),
> > but it's not a complete solution to runtime kernel integrity protection by
> > any stretch of the imagination.  I'm concerned about it being perceived as
> > such.
> 
> What do you think the functionality gaps are in terms of ensuring
> kernel integrity (other than kernel flaws that allow the restrictions
> to be bypassed)?

I don't know of any non-flaw gaps.
Matthew Garrett March 28, 2019, 8:08 p.m. UTC | #12
On Thu, Mar 28, 2019 at 12:23 PM James Morris <jmorris@namei.org> wrote:
>
> On Thu, 28 Mar 2019, Matthew Garrett wrote:
>
> > On Wed, Mar 27, 2019 at 8:15 PM James Morris <jmorris@namei.org> wrote:
> > > OTOH, this seems like a combination of mechanism and policy. The 3 modes
> > > are a help here, but I wonder if they may be too coarse grained still,
> > > e.g. if someone wants to allow a specific mechanism according to their own
> > > threat model and mitigations.
> >
> > In general the interfaces blocked by these patches could also be
> > blocked with an LSM, and I'd guess that people with more fine-grained
> > requirements would probably take that approach.
>
> So... I have to ask, why not use LSM for this in the first place?
>
> Either with an existing module or perhaps a lockdown LSM?

Some of it isn't really achievable that way - for instance, enforcing
module or kexec signatures. We have other mechanisms that can be used
to enable that which could be done at the more fine-grained level, but
a design goal was to make it possible to automatically enable a full
set of integrity protections under specified circumstances.
diff mbox series

Patch

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index b155cd17c1bd..2cde39a875aa 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2585,6 +2585,9 @@  SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
 	if (sysctl_unprivileged_bpf_disabled && !capable(CAP_SYS_ADMIN))
 		return -EPERM;
 
+	if (kernel_is_locked_down("BPF"))
+		return -EPERM;
+
 	err = bpf_check_uarg_tail_zero(uattr, sizeof(attr), size);
 	if (err)
 		return err;