diff mbox series

[bpf] bpf: avoid setting bpf insns pages read-only when prog is jited

Message ID 20191129222911.3710-1-daniel@iogearbox.net
State Accepted
Delegated to: BPF Maintainers
Headers show
Series [bpf] bpf: avoid setting bpf insns pages read-only when prog is jited | expand

Commit Message

Daniel Borkmann Nov. 29, 2019, 10:29 p.m. UTC
For the case where the interpreter is compiled out or when the prog is jited
it is completely unnecessary to set the BPF insn pages as read-only. In fact,
on frequent churn of BPF programs, it could lead to performance degradation of
the system over time since it would break the direct map down to 4k pages when
calling set_memory_ro() for the insn buffer on x86-64 / arm64 and there is no
reverse operation. Thus, avoid breaking up large pages for data maps, and only
limit this to the module range used by the JIT where it is necessary to set
the image read-only and executable.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/filter.h | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Comments

Eric Dumazet Nov. 30, 2019, 1:37 a.m. UTC | #1
On 11/29/19 2:29 PM, Daniel Borkmann wrote:
> For the case where the interpreter is compiled out or when the prog is jited
> it is completely unnecessary to set the BPF insn pages as read-only. In fact,
> on frequent churn of BPF programs, it could lead to performance degradation of
> the system over time since it would break the direct map down to 4k pages when
> calling set_memory_ro() for the insn buffer on x86-64 / arm64 and there is no
> reverse operation. Thus, avoid breaking up large pages for data maps, and only
> limit this to the module range used by the JIT where it is necessary to set
> the image read-only and executable.

Interesting... But why the non JIT case would need RO protection ?

Do you have any performance measures to share ?

Thanks.
Daniel Borkmann Nov. 30, 2019, 9:52 a.m. UTC | #2
On 11/30/19 2:37 AM, Eric Dumazet wrote:
> On 11/29/19 2:29 PM, Daniel Borkmann wrote:
>> For the case where the interpreter is compiled out or when the prog is jited
>> it is completely unnecessary to set the BPF insn pages as read-only. In fact,
>> on frequent churn of BPF programs, it could lead to performance degradation of
>> the system over time since it would break the direct map down to 4k pages when
>> calling set_memory_ro() for the insn buffer on x86-64 / arm64 and there is no
>> reverse operation. Thus, avoid breaking up large pages for data maps, and only
>> limit this to the module range used by the JIT where it is necessary to set
>> the image read-only and executable.
> 
> Interesting... But why the non JIT case would need RO protection ?

It was done for interpreter around 5 years ago mainly due to concerns from security
folks that the BPF insn image could get corrupted (through some other bug in the
kernel) in post-verifier stage by an attacker and then there's nothing really that
would provide any sort of protection guarantees; pretty much the same reasons why
e.g. modules are set to read-only in the kernel.

> Do you have any performance measures to share ?

No numbers, and I'm also not aware of any reports from users, but it was recently
brought to our attention from mm folks during discussion of a different set:

https://lore.kernel.org/lkml/1572171452-7958-2-git-send-email-rppt@kernel.org/T/

Thanks,
Daniel
Alexei Starovoitov Dec. 1, 2019, 5:54 p.m. UTC | #3
On Sat, Nov 30, 2019 at 1:52 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 11/30/19 2:37 AM, Eric Dumazet wrote:
> > On 11/29/19 2:29 PM, Daniel Borkmann wrote:
> >> For the case where the interpreter is compiled out or when the prog is jited
> >> it is completely unnecessary to set the BPF insn pages as read-only. In fact,
> >> on frequent churn of BPF programs, it could lead to performance degradation of
> >> the system over time since it would break the direct map down to 4k pages when
> >> calling set_memory_ro() for the insn buffer on x86-64 / arm64 and there is no
> >> reverse operation. Thus, avoid breaking up large pages for data maps, and only
> >> limit this to the module range used by the JIT where it is necessary to set
> >> the image read-only and executable.
> >
> > Interesting... But why the non JIT case would need RO protection ?
>
> It was done for interpreter around 5 years ago mainly due to concerns from security
> folks that the BPF insn image could get corrupted (through some other bug in the
> kernel) in post-verifier stage by an attacker and then there's nothing really that
> would provide any sort of protection guarantees; pretty much the same reasons why
> e.g. modules are set to read-only in the kernel.
>
> > Do you have any performance measures to share ?
>
> No numbers, and I'm also not aware of any reports from users, but it was recently
> brought to our attention from mm folks during discussion of a different set:
>
> https://lore.kernel.org/lkml/1572171452-7958-2-git-send-email-rppt@kernel.org/T/

Applied. Thanks
Eric Dumazet Dec. 2, 2019, 2:49 a.m. UTC | #4
On 11/30/19 1:52 AM, Daniel Borkmann wrote:
> On 11/30/19 2:37 AM, Eric Dumazet wrote:
>> On 11/29/19 2:29 PM, Daniel Borkmann wrote:
>>> For the case where the interpreter is compiled out or when the prog is jited
>>> it is completely unnecessary to set the BPF insn pages as read-only. In fact,
>>> on frequent churn of BPF programs, it could lead to performance degradation of
>>> the system over time since it would break the direct map down to 4k pages when
>>> calling set_memory_ro() for the insn buffer on x86-64 / arm64 and there is no
>>> reverse operation. Thus, avoid breaking up large pages for data maps, and only
>>> limit this to the module range used by the JIT where it is necessary to set
>>> the image read-only and executable.
>>
>> Interesting... But why the non JIT case would need RO protection ?
> 
> It was done for interpreter around 5 years ago mainly due to concerns from security
> folks that the BPF insn image could get corrupted (through some other bug in the
> kernel) in post-verifier stage by an attacker and then there's nothing really that
> would provide any sort of protection guarantees; pretty much the same reasons why
> e.g. modules are set to read-only in the kernel.
> 
>> Do you have any performance measures to share ?
> 
> No numbers, and I'm also not aware of any reports from users, but it was recently
> brought to our attention from mm folks during discussion of a different set:
> 
> https://lore.kernel.org/lkml/1572171452-7958-2-git-send-email-rppt@kernel.org/T/
> 

Thanks for the link !

Having RO protection as a debug feature would be useful.

I believe we have CONFIG_STRICT_MODULE_RWX (and CONFIG_STRICT_KERNEL_RWX) for that already.

Or are we saying we also want to get rid of them ?
H. Peter Anvin Dec. 2, 2019, 3:44 a.m. UTC | #5
On December 1, 2019 6:49:32 PM PST, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>On 11/30/19 1:52 AM, Daniel Borkmann wrote:
>> On 11/30/19 2:37 AM, Eric Dumazet wrote:
>>> On 11/29/19 2:29 PM, Daniel Borkmann wrote:
>>>> For the case where the interpreter is compiled out or when the prog
>is jited
>>>> it is completely unnecessary to set the BPF insn pages as
>read-only. In fact,
>>>> on frequent churn of BPF programs, it could lead to performance
>degradation of
>>>> the system over time since it would break the direct map down to 4k
>pages when
>>>> calling set_memory_ro() for the insn buffer on x86-64 / arm64 and
>there is no
>>>> reverse operation. Thus, avoid breaking up large pages for data
>maps, and only
>>>> limit this to the module range used by the JIT where it is
>necessary to set
>>>> the image read-only and executable.
>>>
>>> Interesting... But why the non JIT case would need RO protection ?
>> 
>> It was done for interpreter around 5 years ago mainly due to concerns
>from security
>> folks that the BPF insn image could get corrupted (through some other
>bug in the
>> kernel) in post-verifier stage by an attacker and then there's
>nothing really that
>> would provide any sort of protection guarantees; pretty much the same
>reasons why
>> e.g. modules are set to read-only in the kernel.
>> 
>>> Do you have any performance measures to share ?
>> 
>> No numbers, and I'm also not aware of any reports from users, but it
>was recently
>> brought to our attention from mm folks during discussion of a
>different set:
>> 
>>
>https://lore.kernel.org/lkml/1572171452-7958-2-git-send-email-rppt@kernel.org/T/
>> 
>
>Thanks for the link !
>
>Having RO protection as a debug feature would be useful.
>
>I believe we have CONFIG_STRICT_MODULE_RWX (and
>CONFIG_STRICT_KERNEL_RWX) for that already.
>
>Or are we saying we also want to get rid of them ?

The notion is that for security there should never been a page which is both writable and executable at the same time. This makes it harder to inject code.
Peter Zijlstra Dec. 2, 2019, 8:30 a.m. UTC | #6
On Sun, Dec 01, 2019 at 06:49:32PM -0800, Eric Dumazet wrote:

> Thanks for the link !
> 
> Having RO protection as a debug feature would be useful.
> 
> I believe we have CONFIG_STRICT_MODULE_RWX (and CONFIG_STRICT_KERNEL_RWX) for that already.
> 
> Or are we saying we also want to get rid of them ?

No, in fact I'm working on making that stronger. We currently still have
a few cases that violate the W^X rule.

The thing is, when the BPF stuff is JIT'ed, the actual BPF instruction
page is not actually executed at all, so making it RO serves no purpose,
other than to fragment the direct map.

All actual code lives in the 2G range that x86_64 can directly branch
to, but this BPF instruction stuff lives in the general data heap and
can thus cause much more fragmentation of the direct map.
Daniel Borkmann Dec. 2, 2019, 9:17 a.m. UTC | #7
On Mon, Dec 02, 2019 at 09:30:06AM +0100, Peter Zijlstra wrote:
> On Sun, Dec 01, 2019 at 06:49:32PM -0800, Eric Dumazet wrote:
> 
> > Thanks for the link !
> > 
> > Having RO protection as a debug feature would be useful.
> > 
> > I believe we have CONFIG_STRICT_MODULE_RWX (and CONFIG_STRICT_KERNEL_RWX) for that already.
> > 
> > Or are we saying we also want to get rid of them ?
> 
> No, in fact I'm working on making that stronger. We currently still have
> a few cases that violate the W^X rule.
> 
> The thing is, when the BPF stuff is JIT'ed, the actual BPF instruction
> page is not actually executed at all, so making it RO serves no purpose,
> other than to fragment the direct map.

Yes exactly, in that case it is only used for dumping the BPF insns back
to user space and therefore no need at all to set it RO. (The JITed image
however *is* set as RO. - Perhaps there was some confusion given your
earlier question.)

Thanks,
Daniel
Alexei Starovoitov Dec. 2, 2019, 4:19 p.m. UTC | #8
On Mon, Dec 2, 2019 at 1:17 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On Mon, Dec 02, 2019 at 09:30:06AM +0100, Peter Zijlstra wrote:
> > On Sun, Dec 01, 2019 at 06:49:32PM -0800, Eric Dumazet wrote:
> >
> > > Thanks for the link !
> > >
> > > Having RO protection as a debug feature would be useful.
> > >
> > > I believe we have CONFIG_STRICT_MODULE_RWX (and CONFIG_STRICT_KERNEL_RWX) for that already.
> > >
> > > Or are we saying we also want to get rid of them ?
> >
> > No, in fact I'm working on making that stronger. We currently still have
> > a few cases that violate the W^X rule.
> >
> > The thing is, when the BPF stuff is JIT'ed, the actual BPF instruction
> > page is not actually executed at all, so making it RO serves no purpose,
> > other than to fragment the direct map.
>
> Yes exactly, in that case it is only used for dumping the BPF insns back
> to user space and therefore no need at all to set it RO. (The JITed image
> however *is* set as RO. - Perhaps there was some confusion given your
> earlier question.)

May be we should also flip the default to net.core.bpf_jit_enable=1
for x86-64 ? and may be arm64 ? These two JITs are well tested
and maintained.
Daniel Borkmann Dec. 2, 2019, 8:09 p.m. UTC | #9
On Mon, Dec 02, 2019 at 08:19:45AM -0800, Alexei Starovoitov wrote:
> On Mon, Dec 2, 2019 at 1:17 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
> > On Mon, Dec 02, 2019 at 09:30:06AM +0100, Peter Zijlstra wrote:
> > > On Sun, Dec 01, 2019 at 06:49:32PM -0800, Eric Dumazet wrote:
> > >
> > > > Thanks for the link !
> > > >
> > > > Having RO protection as a debug feature would be useful.
> > > >
> > > > I believe we have CONFIG_STRICT_MODULE_RWX (and CONFIG_STRICT_KERNEL_RWX) for that already.
> > > >
> > > > Or are we saying we also want to get rid of them ?
> > >
> > > No, in fact I'm working on making that stronger. We currently still have
> > > a few cases that violate the W^X rule.
> > >
> > > The thing is, when the BPF stuff is JIT'ed, the actual BPF instruction
> > > page is not actually executed at all, so making it RO serves no purpose,
> > > other than to fragment the direct map.
> >
> > Yes exactly, in that case it is only used for dumping the BPF insns back
> > to user space and therefore no need at all to set it RO. (The JITed image
> > however *is* set as RO. - Perhaps there was some confusion given your
> > earlier question.)
> 
> May be we should also flip the default to net.core.bpf_jit_enable=1
> for x86-64 ? and may be arm64 ? These two JITs are well tested
> and maintained.

Seems reasonable given their status and exposure they've had over the years. I
can follow-up on that.

Thanks,
Daniel
diff mbox series

Patch

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 1b1e8b8f88da..a141cb07e76a 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -776,8 +776,12 @@  bpf_ctx_narrow_access_offset(u32 off, u32 size, u32 size_default)
 
 static inline void bpf_prog_lock_ro(struct bpf_prog *fp)
 {
-	set_vm_flush_reset_perms(fp);
-	set_memory_ro((unsigned long)fp, fp->pages);
+#ifndef CONFIG_BPF_JIT_ALWAYS_ON
+	if (!fp->jited) {
+		set_vm_flush_reset_perms(fp);
+		set_memory_ro((unsigned long)fp, fp->pages);
+	}
+#endif
 }
 
 static inline void bpf_jit_binary_lock_ro(struct bpf_binary_header *hdr)