Message ID | 20191129222911.3710-1-daniel@iogearbox.net |
---|---|
State | Accepted |
Delegated to: | BPF Maintainers |
Headers | show |
Series | [bpf] bpf: avoid setting bpf insns pages read-only when prog is jited | expand |
On 11/29/19 2:29 PM, Daniel Borkmann wrote: > For the case where the interpreter is compiled out or when the prog is jited > it is completely unnecessary to set the BPF insn pages as read-only. In fact, > on frequent churn of BPF programs, it could lead to performance degradation of > the system over time since it would break the direct map down to 4k pages when > calling set_memory_ro() for the insn buffer on x86-64 / arm64 and there is no > reverse operation. Thus, avoid breaking up large pages for data maps, and only > limit this to the module range used by the JIT where it is necessary to set > the image read-only and executable. Interesting... But why the non JIT case would need RO protection ? Do you have any performance measures to share ? Thanks.
On 11/30/19 2:37 AM, Eric Dumazet wrote: > On 11/29/19 2:29 PM, Daniel Borkmann wrote: >> For the case where the interpreter is compiled out or when the prog is jited >> it is completely unnecessary to set the BPF insn pages as read-only. In fact, >> on frequent churn of BPF programs, it could lead to performance degradation of >> the system over time since it would break the direct map down to 4k pages when >> calling set_memory_ro() for the insn buffer on x86-64 / arm64 and there is no >> reverse operation. Thus, avoid breaking up large pages for data maps, and only >> limit this to the module range used by the JIT where it is necessary to set >> the image read-only and executable. > > Interesting... But why the non JIT case would need RO protection ? It was done for interpreter around 5 years ago mainly due to concerns from security folks that the BPF insn image could get corrupted (through some other bug in the kernel) in post-verifier stage by an attacker and then there's nothing really that would provide any sort of protection guarantees; pretty much the same reasons why e.g. modules are set to read-only in the kernel. > Do you have any performance measures to share ? No numbers, and I'm also not aware of any reports from users, but it was recently brought to our attention from mm folks during discussion of a different set: https://lore.kernel.org/lkml/1572171452-7958-2-git-send-email-rppt@kernel.org/T/ Thanks, Daniel
On Sat, Nov 30, 2019 at 1:52 AM Daniel Borkmann <daniel@iogearbox.net> wrote: > > On 11/30/19 2:37 AM, Eric Dumazet wrote: > > On 11/29/19 2:29 PM, Daniel Borkmann wrote: > >> For the case where the interpreter is compiled out or when the prog is jited > >> it is completely unnecessary to set the BPF insn pages as read-only. In fact, > >> on frequent churn of BPF programs, it could lead to performance degradation of > >> the system over time since it would break the direct map down to 4k pages when > >> calling set_memory_ro() for the insn buffer on x86-64 / arm64 and there is no > >> reverse operation. Thus, avoid breaking up large pages for data maps, and only > >> limit this to the module range used by the JIT where it is necessary to set > >> the image read-only and executable. > > > > Interesting... But why the non JIT case would need RO protection ? > > It was done for interpreter around 5 years ago mainly due to concerns from security > folks that the BPF insn image could get corrupted (through some other bug in the > kernel) in post-verifier stage by an attacker and then there's nothing really that > would provide any sort of protection guarantees; pretty much the same reasons why > e.g. modules are set to read-only in the kernel. > > > Do you have any performance measures to share ? > > No numbers, and I'm also not aware of any reports from users, but it was recently > brought to our attention from mm folks during discussion of a different set: > > https://lore.kernel.org/lkml/1572171452-7958-2-git-send-email-rppt@kernel.org/T/ Applied. Thanks
On 11/30/19 1:52 AM, Daniel Borkmann wrote: > On 11/30/19 2:37 AM, Eric Dumazet wrote: >> On 11/29/19 2:29 PM, Daniel Borkmann wrote: >>> For the case where the interpreter is compiled out or when the prog is jited >>> it is completely unnecessary to set the BPF insn pages as read-only. In fact, >>> on frequent churn of BPF programs, it could lead to performance degradation of >>> the system over time since it would break the direct map down to 4k pages when >>> calling set_memory_ro() for the insn buffer on x86-64 / arm64 and there is no >>> reverse operation. Thus, avoid breaking up large pages for data maps, and only >>> limit this to the module range used by the JIT where it is necessary to set >>> the image read-only and executable. >> >> Interesting... But why the non JIT case would need RO protection ? > > It was done for interpreter around 5 years ago mainly due to concerns from security > folks that the BPF insn image could get corrupted (through some other bug in the > kernel) in post-verifier stage by an attacker and then there's nothing really that > would provide any sort of protection guarantees; pretty much the same reasons why > e.g. modules are set to read-only in the kernel. > >> Do you have any performance measures to share ? > > No numbers, and I'm also not aware of any reports from users, but it was recently > brought to our attention from mm folks during discussion of a different set: > > https://lore.kernel.org/lkml/1572171452-7958-2-git-send-email-rppt@kernel.org/T/ > Thanks for the link ! Having RO protection as a debug feature would be useful. I believe we have CONFIG_STRICT_MODULE_RWX (and CONFIG_STRICT_KERNEL_RWX) for that already. Or are we saying we also want to get rid of them ?
On December 1, 2019 6:49:32 PM PST, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > >On 11/30/19 1:52 AM, Daniel Borkmann wrote: >> On 11/30/19 2:37 AM, Eric Dumazet wrote: >>> On 11/29/19 2:29 PM, Daniel Borkmann wrote: >>>> For the case where the interpreter is compiled out or when the prog >is jited >>>> it is completely unnecessary to set the BPF insn pages as >read-only. In fact, >>>> on frequent churn of BPF programs, it could lead to performance >degradation of >>>> the system over time since it would break the direct map down to 4k >pages when >>>> calling set_memory_ro() for the insn buffer on x86-64 / arm64 and >there is no >>>> reverse operation. Thus, avoid breaking up large pages for data >maps, and only >>>> limit this to the module range used by the JIT where it is >necessary to set >>>> the image read-only and executable. >>> >>> Interesting... But why the non JIT case would need RO protection ? >> >> It was done for interpreter around 5 years ago mainly due to concerns >from security >> folks that the BPF insn image could get corrupted (through some other >bug in the >> kernel) in post-verifier stage by an attacker and then there's >nothing really that >> would provide any sort of protection guarantees; pretty much the same >reasons why >> e.g. modules are set to read-only in the kernel. >> >>> Do you have any performance measures to share ? >> >> No numbers, and I'm also not aware of any reports from users, but it >was recently >> brought to our attention from mm folks during discussion of a >different set: >> >> >https://lore.kernel.org/lkml/1572171452-7958-2-git-send-email-rppt@kernel.org/T/ >> > >Thanks for the link ! > >Having RO protection as a debug feature would be useful. > >I believe we have CONFIG_STRICT_MODULE_RWX (and >CONFIG_STRICT_KERNEL_RWX) for that already. > >Or are we saying we also want to get rid of them ? The notion is that for security there should never been a page which is both writable and executable at the same time. This makes it harder to inject code.
On Sun, Dec 01, 2019 at 06:49:32PM -0800, Eric Dumazet wrote: > Thanks for the link ! > > Having RO protection as a debug feature would be useful. > > I believe we have CONFIG_STRICT_MODULE_RWX (and CONFIG_STRICT_KERNEL_RWX) for that already. > > Or are we saying we also want to get rid of them ? No, in fact I'm working on making that stronger. We currently still have a few cases that violate the W^X rule. The thing is, when the BPF stuff is JIT'ed, the actual BPF instruction page is not actually executed at all, so making it RO serves no purpose, other than to fragment the direct map. All actual code lives in the 2G range that x86_64 can directly branch to, but this BPF instruction stuff lives in the general data heap and can thus cause much more fragmentation of the direct map.
On Mon, Dec 02, 2019 at 09:30:06AM +0100, Peter Zijlstra wrote: > On Sun, Dec 01, 2019 at 06:49:32PM -0800, Eric Dumazet wrote: > > > Thanks for the link ! > > > > Having RO protection as a debug feature would be useful. > > > > I believe we have CONFIG_STRICT_MODULE_RWX (and CONFIG_STRICT_KERNEL_RWX) for that already. > > > > Or are we saying we also want to get rid of them ? > > No, in fact I'm working on making that stronger. We currently still have > a few cases that violate the W^X rule. > > The thing is, when the BPF stuff is JIT'ed, the actual BPF instruction > page is not actually executed at all, so making it RO serves no purpose, > other than to fragment the direct map. Yes exactly, in that case it is only used for dumping the BPF insns back to user space and therefore no need at all to set it RO. (The JITed image however *is* set as RO. - Perhaps there was some confusion given your earlier question.) Thanks, Daniel
On Mon, Dec 2, 2019 at 1:17 AM Daniel Borkmann <daniel@iogearbox.net> wrote: > > On Mon, Dec 02, 2019 at 09:30:06AM +0100, Peter Zijlstra wrote: > > On Sun, Dec 01, 2019 at 06:49:32PM -0800, Eric Dumazet wrote: > > > > > Thanks for the link ! > > > > > > Having RO protection as a debug feature would be useful. > > > > > > I believe we have CONFIG_STRICT_MODULE_RWX (and CONFIG_STRICT_KERNEL_RWX) for that already. > > > > > > Or are we saying we also want to get rid of them ? > > > > No, in fact I'm working on making that stronger. We currently still have > > a few cases that violate the W^X rule. > > > > The thing is, when the BPF stuff is JIT'ed, the actual BPF instruction > > page is not actually executed at all, so making it RO serves no purpose, > > other than to fragment the direct map. > > Yes exactly, in that case it is only used for dumping the BPF insns back > to user space and therefore no need at all to set it RO. (The JITed image > however *is* set as RO. - Perhaps there was some confusion given your > earlier question.) May be we should also flip the default to net.core.bpf_jit_enable=1 for x86-64 ? and may be arm64 ? These two JITs are well tested and maintained.
On Mon, Dec 02, 2019 at 08:19:45AM -0800, Alexei Starovoitov wrote: > On Mon, Dec 2, 2019 at 1:17 AM Daniel Borkmann <daniel@iogearbox.net> wrote: > > On Mon, Dec 02, 2019 at 09:30:06AM +0100, Peter Zijlstra wrote: > > > On Sun, Dec 01, 2019 at 06:49:32PM -0800, Eric Dumazet wrote: > > > > > > > Thanks for the link ! > > > > > > > > Having RO protection as a debug feature would be useful. > > > > > > > > I believe we have CONFIG_STRICT_MODULE_RWX (and CONFIG_STRICT_KERNEL_RWX) for that already. > > > > > > > > Or are we saying we also want to get rid of them ? > > > > > > No, in fact I'm working on making that stronger. We currently still have > > > a few cases that violate the W^X rule. > > > > > > The thing is, when the BPF stuff is JIT'ed, the actual BPF instruction > > > page is not actually executed at all, so making it RO serves no purpose, > > > other than to fragment the direct map. > > > > Yes exactly, in that case it is only used for dumping the BPF insns back > > to user space and therefore no need at all to set it RO. (The JITed image > > however *is* set as RO. - Perhaps there was some confusion given your > > earlier question.) > > May be we should also flip the default to net.core.bpf_jit_enable=1 > for x86-64 ? and may be arm64 ? These two JITs are well tested > and maintained. Seems reasonable given their status and exposure they've had over the years. I can follow-up on that. Thanks, Daniel
diff --git a/include/linux/filter.h b/include/linux/filter.h index 1b1e8b8f88da..a141cb07e76a 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -776,8 +776,12 @@ bpf_ctx_narrow_access_offset(u32 off, u32 size, u32 size_default) static inline void bpf_prog_lock_ro(struct bpf_prog *fp) { - set_vm_flush_reset_perms(fp); - set_memory_ro((unsigned long)fp, fp->pages); +#ifndef CONFIG_BPF_JIT_ALWAYS_ON + if (!fp->jited) { + set_vm_flush_reset_perms(fp); + set_memory_ro((unsigned long)fp, fp->pages); + } +#endif } static inline void bpf_jit_binary_lock_ro(struct bpf_binary_header *hdr)
For the case where the interpreter is compiled out or when the prog is jited it is completely unnecessary to set the BPF insn pages as read-only. In fact, on frequent churn of BPF programs, it could lead to performance degradation of the system over time since it would break the direct map down to 4k pages when calling set_memory_ro() for the insn buffer on x86-64 / arm64 and there is no reverse operation. Thus, avoid breaking up large pages for data maps, and only limit this to the module range used by the JIT where it is necessary to set the image read-only and executable. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> --- include/linux/filter.h | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)