Message ID | 1368844623.3301.142.camel@edumazet-glaptop |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
On 05/18/2013 04:37 AM, Eric Dumazet wrote: > From: Eric Dumazet <edumazet@google.com> > > hpa bringed into my attention some security related issues > with BPF JIT on x86. > > This patch makes sure the bpf generated code is marked read only, > as other kernel text sections. > > It also splits the unused space (we vmalloc() and only use a fraction of > the page) in two parts, so that the generated bpf code not starts at a > known offset in the page, but a pseudo random one. > > Refs: > http://mainisusuallyafunction.blogspot.com/2012/11/attacking-hardened-linux-systems-with.html > > Reported-by: H. Peter Anvin <hpa@zytor.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> Great work ! Probably other archs could later on follow-up with setting to read-only, too. Reviewed-by: Daniel Borkmann <dborkman@redhat.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Daniel Borkmann <dborkman@redhat.com> Date: Sun, 19 May 2013 19:02:46 +0200 > Probably other archs could later on follow-up with setting to > read-only, too. Only s390 and x86 support this facility. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Fri, 17 May 2013 19:37:03 -0700 > From: Eric Dumazet <edumazet@google.com> > > hpa bringed into my attention some security related issues > with BPF JIT on x86. > > This patch makes sure the bpf generated code is marked read only, > as other kernel text sections. > > It also splits the unused space (we vmalloc() and only use a fraction of > the page) in two parts, so that the generated bpf code not starts at a > known offset in the page, but a pseudo random one. > > Refs: > http://mainisusuallyafunction.blogspot.com/2012/11/attacking-hardened-linux-systems-with.html > > Reported-by: H. Peter Anvin <hpa@zytor.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> Applied. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> hpa bringed into my attention some security related issues > with BPF JIT on x86. > > This patch makes sure the bpf generated code is marked read only, > as other kernel text sections. > > It also splits the unused space (we vmalloc() and only use a fraction of > the page) in two parts, so that the generated bpf code not starts at a > known offset in the page, but a pseudo random one. ... > +static struct bpf_binary_header *bpf_alloc_binary(unsigned int proglen, > + u8 **image_ptr) ... > + /* insert a random number of int3 instructions before BPF code */ > + *image_ptr = &header->image[prandom_u32() % hole]; > + return header; > +} Hmmm.... anyone looking to overwrite kernel code will then start looking for blocks of 0xcc bytes and know that what follows is the beginning of a function. That isn't any harder than random writes. Copying a random part of .rodata might be better - especially if you can find part of .rodata.str*. David
On 05/20/2013 10:51 AM, David Laight wrote: >> hpa bringed into my attention some security related issues >> with BPF JIT on x86. >> >> This patch makes sure the bpf generated code is marked read only, >> as other kernel text sections. >> >> It also splits the unused space (we vmalloc() and only use a fraction of >> the page) in two parts, so that the generated bpf code not starts at a >> known offset in the page, but a pseudo random one. > ... >> +static struct bpf_binary_header *bpf_alloc_binary(unsigned int proglen, >> + u8 **image_ptr) > ... >> + /* insert a random number of int3 instructions before BPF code */ >> + *image_ptr = &header->image[prandom_u32() % hole]; >> + return header; >> +} > > Hmmm.... anyone looking to overwrite kernel code will then start > looking for blocks of 0xcc bytes and know that what follows > is the beginning of a function. > That isn't any harder than random writes. > > Copying a random part of .rodata might be better - especially > if you can find part of .rodata.str*. Here seems also to be another approach ... http://grsecurity.net/~spender/jit_prot.diff via: http://www.reddit.com/r/netsec/comments/13dzhx/linux_kernel_jit_spray_for_smep_kernexec_bypass/ -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2013-05-20 at 09:51 +0100, David Laight wrote: > Hmmm.... anyone looking to overwrite kernel code will then start > looking for blocks of 0xcc bytes and know that what follows > is the beginning of a function. > That isn't any harder than random writes. > > Copying a random part of .rodata might be better - especially > if you can find part of .rodata.str*. That's not the point. We want to catch jumps to before/after the code. An attacker having full access to kernel code in read and write mode has full power anyway to do whatever he wants. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2013-05-20 at 11:50 +0200, Daniel Borkmann wrote: > Here seems also to be another approach ... > > http://grsecurity.net/~spender/jit_prot.diff > > via: http://www.reddit.com/r/netsec/comments/13dzhx/linux_kernel_jit_spray_for_smep_kernexec_bypass/ Well, there are many approaches, and I have another one as well provided by H. Peter Anvin. Idea was to allow the code being relocated outside of the 2GB space that we use for kernel code (including module_alloc()) So every call helper, coded in "e8 xx xx xx xx" was replaced by "48 c7 c0 yy yy yy yy mov $foo,%rax" "ff d0 call *%rax" The RO protection + random holes idea was a solution with no performance impact. Another idea is to limit BPF JIT to root users. I do not think BPF JIT is mandatory at all, as tcpdump is already restricted. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet <eric.dumazet@gmail.com> wrote: > From: Eric Dumazet <edumazet@google.com> > > hpa bringed into my attention some security related issues > with BPF JIT on x86. > > This patch makes sure the bpf generated code is marked read only, > as other kernel text sections. > > It also splits the unused space (we vmalloc() and only use a fraction of > the page) in two parts, so that the generated bpf code not starts at a > known offset in the page, but a pseudo random one. > > Refs: > http://mainisusuallyafunction.blogspot.com/2012/11/attacking-hardened-linux-systems-with.html What about emitting additional instructions at random locations in the generated code itself? Eg., after every instruction, have random chance to insert 'xor $0xcc,%al; xor $0xcc,%al', etc? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2013-05-20 at 16:19 +0200, Florian Westphal wrote: > What about emitting additional instructions at random locations in the > generated code itself? > > Eg., after every instruction, have random chance to insert > 'xor $0xcc,%al; xor $0xcc,%al', etc? This will be the latest thing I'll do. Frankly, whole point of BPF JIT is speed. If we have slow code, just use the interpretor instead. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > What about emitting additional instructions at random locations in the > > generated code itself? > > > > Eg., after every instruction, have random chance to insert > > 'xor $0xcc,%al; xor $0xcc,%al', etc? > > This will be the latest thing I'll do. > > Frankly, whole point of BPF JIT is speed. > > If we have slow code, just use the interpretor instead. Adding one of the standard nop opcodes wouldn't be too bad. IIRC 0x90 is skipped very early on by modern cpu. Adding one after every nth (or n-mth) instruction would probably break the alternate instruction stream. However the attacker could (probably) keep installing code patterns until the guess pattern matched. Also the code size changes might make the JIT compile fail - maybe because of branch offsets, or just size. David
Hi Eric, Peter talked to me about this BPF work to prevent JIT spraying attacks in the beginning of this week and I took a look at your patch. Some comments: * Meta-comment about patch structure: why this was a one patch and not two patches? It changes two things that are orthogonal to each other (random offset, RW -> RO change). * Should NX bit be turned on while JIT code is being prepared? * How hard it would be to read value of bpf_func pointer? If attacker is able to read that, it would compromise the whole randomization scheme. * I loved the socket creation trick in the blog post :) Are there any plans to do something about it? * How was minimum entropy of 128 bytes chose? The patch description does not explain this in anyway although it seems like decent choice. /Jarkko On 17.05.2013 19:37, Eric Dumazet wrote: > From: Eric Dumazet <edumazet@google.com> > > hpa bringed into my attention some security related issues > with BPF JIT on x86. > > This patch makes sure the bpf generated code is marked read only, > as other kernel text sections. > > It also splits the unused space (we vmalloc() and only use a fraction of > the page) in two parts, so that the generated bpf code not starts at a > known offset in the page, but a pseudo random one. > > Refs: > http://mainisusuallyafunction.blogspot.com/2012/11/attacking-hardened-linux-systems-with.html > > Reported-by: H. Peter Anvin <hpa@zytor.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> > --- > arch/x86/net/bpf_jit_comp.c | 53 ++++++++++++++++++++++++++++++---- > 1 file changed, 47 insertions(+), 6 deletions(-) > > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c > index c0212db..79c216a 100644 > --- a/arch/x86/net/bpf_jit_comp.c > +++ b/arch/x86/net/bpf_jit_comp.c > @@ -12,6 +12,7 @@ > #include <linux/netdevice.h> > #include <linux/filter.h> > #include <linux/if_vlan.h> > +#include <linux/random.h> > > /* > * Conventions : > @@ -144,6 +145,39 @@ static int pkt_type_offset(void) > return -1; > } > > +struct bpf_binary_header { > + unsigned int pages; > + /* Note : for security reasons, bpf code will follow a randomly > + * sized amount of int3 instructions > + */ > + u8 image[]; > +}; > + > +static struct bpf_binary_header *bpf_alloc_binary(unsigned int proglen, > + u8 **image_ptr) > +{ > + unsigned int sz, hole; > + struct bpf_binary_header *header; > + > + /* Most of BPF filters are really small, > + * but if some of them fill a page, allow at least > + * 128 extra bytes to insert a random section of int3 > + */ > + sz = round_up(proglen + sizeof(*header) + 128, PAGE_SIZE); > + header = module_alloc(sz); > + if (!header) > + return NULL; > + > + memset(header, 0xcc, sz); /* fill whole space with int3 instructions */ > + > + header->pages = sz / PAGE_SIZE; > + hole = sz - (proglen + sizeof(*header)); > + > + /* insert a random number of int3 instructions before BPF code */ > + *image_ptr = &header->image[prandom_u32() % hole]; > + return header; > +} > + > void bpf_jit_compile(struct sk_filter *fp) > { > u8 temp[64]; > @@ -153,6 +187,7 @@ void bpf_jit_compile(struct sk_filter *fp) > int t_offset, f_offset; > u8 t_op, f_op, seen = 0, pass; > u8 *image = NULL; > + struct bpf_binary_header *header = NULL; > u8 *func; > int pc_ret0 = -1; /* bpf index of first RET #0 instruction (if any) */ > unsigned int cleanup_addr; /* epilogue code offset */ > @@ -693,7 +728,7 @@ cond_branch: f_offset = addrs[i + filter[i].jf] - addrs[i]; > if (unlikely(proglen + ilen > oldproglen)) { > pr_err("bpb_jit_compile fatal error\n"); > kfree(addrs); > - module_free(NULL, image); > + module_free(NULL, header); > return; > } > memcpy(image + proglen, temp, ilen); > @@ -717,8 +752,8 @@ cond_branch: f_offset = addrs[i + filter[i].jf] - addrs[i]; > break; > } > if (proglen == oldproglen) { > - image = module_alloc(proglen); > - if (!image) > + header = bpf_alloc_binary(proglen, &image); > + if (!header) > goto out; > } > oldproglen = proglen; > @@ -728,7 +763,8 @@ cond_branch: f_offset = addrs[i + filter[i].jf] - addrs[i]; > bpf_jit_dump(flen, proglen, pass, image); > > if (image) { > - bpf_flush_icache(image, image + proglen); > + bpf_flush_icache(header, image + proglen); > + set_memory_ro((unsigned long)header, header->pages); > fp->bpf_func = (void *)image; > } > out: > @@ -738,6 +774,11 @@ out: > > void bpf_jit_free(struct sk_filter *fp) > { > - if (fp->bpf_func != sk_run_filter) > - module_free(NULL, fp->bpf_func); > + if (fp->bpf_func != sk_run_filter) { > + unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK; > + struct bpf_binary_header *header = (void *)addr; > + > + set_memory_rw(addr, header->pages); > + module_free(NULL, header); > + } > } > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index c0212db..79c216a 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -12,6 +12,7 @@ #include <linux/netdevice.h> #include <linux/filter.h> #include <linux/if_vlan.h> +#include <linux/random.h> /* * Conventions : @@ -144,6 +145,39 @@ static int pkt_type_offset(void) return -1; } +struct bpf_binary_header { + unsigned int pages; + /* Note : for security reasons, bpf code will follow a randomly + * sized amount of int3 instructions + */ + u8 image[]; +}; + +static struct bpf_binary_header *bpf_alloc_binary(unsigned int proglen, + u8 **image_ptr) +{ + unsigned int sz, hole; + struct bpf_binary_header *header; + + /* Most of BPF filters are really small, + * but if some of them fill a page, allow at least + * 128 extra bytes to insert a random section of int3 + */ + sz = round_up(proglen + sizeof(*header) + 128, PAGE_SIZE); + header = module_alloc(sz); + if (!header) + return NULL; + + memset(header, 0xcc, sz); /* fill whole space with int3 instructions */ + + header->pages = sz / PAGE_SIZE; + hole = sz - (proglen + sizeof(*header)); + + /* insert a random number of int3 instructions before BPF code */ + *image_ptr = &header->image[prandom_u32() % hole]; + return header; +} + void bpf_jit_compile(struct sk_filter *fp) { u8 temp[64]; @@ -153,6 +187,7 @@ void bpf_jit_compile(struct sk_filter *fp) int t_offset, f_offset; u8 t_op, f_op, seen = 0, pass; u8 *image = NULL; + struct bpf_binary_header *header = NULL; u8 *func; int pc_ret0 = -1; /* bpf index of first RET #0 instruction (if any) */ unsigned int cleanup_addr; /* epilogue code offset */ @@ -693,7 +728,7 @@ cond_branch: f_offset = addrs[i + filter[i].jf] - addrs[i]; if (unlikely(proglen + ilen > oldproglen)) { pr_err("bpb_jit_compile fatal error\n"); kfree(addrs); - module_free(NULL, image); + module_free(NULL, header); return; } memcpy(image + proglen, temp, ilen); @@ -717,8 +752,8 @@ cond_branch: f_offset = addrs[i + filter[i].jf] - addrs[i]; break; } if (proglen == oldproglen) { - image = module_alloc(proglen); - if (!image) + header = bpf_alloc_binary(proglen, &image); + if (!header) goto out; } oldproglen = proglen; @@ -728,7 +763,8 @@ cond_branch: f_offset = addrs[i + filter[i].jf] - addrs[i]; bpf_jit_dump(flen, proglen, pass, image); if (image) { - bpf_flush_icache(image, image + proglen); + bpf_flush_icache(header, image + proglen); + set_memory_ro((unsigned long)header, header->pages); fp->bpf_func = (void *)image; } out: @@ -738,6 +774,11 @@ out: void bpf_jit_free(struct sk_filter *fp) { - if (fp->bpf_func != sk_run_filter) - module_free(NULL, fp->bpf_func); + if (fp->bpf_func != sk_run_filter) { + unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK; + struct bpf_binary_header *header = (void *)addr; + + set_memory_rw(addr, header->pages); + module_free(NULL, header); + } }