diff mbox

[net-next] x86: bpf_jit_comp: secure bpf jit against spraying attacks

Message ID 1368844623.3301.142.camel@edumazet-glaptop
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet May 18, 2013, 2:37 a.m. UTC
From: Eric Dumazet <edumazet@google.com>

hpa bringed into my attention some security related issues
with BPF JIT on x86.

This patch makes sure the bpf generated code is marked read only,
as other kernel text sections.

It also splits the unused space (we vmalloc() and only use a fraction of
the page) in two parts, so that the generated bpf code not starts at a
known offset in the page, but a pseudo random one.

Refs:
http://mainisusuallyafunction.blogspot.com/2012/11/attacking-hardened-linux-systems-with.html

Reported-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 arch/x86/net/bpf_jit_comp.c |   53 ++++++++++++++++++++++++++++++----
 1 file changed, 47 insertions(+), 6 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Daniel Borkmann May 19, 2013, 5:02 p.m. UTC | #1
On 05/18/2013 04:37 AM, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> hpa bringed into my attention some security related issues
> with BPF JIT on x86.
>
> This patch makes sure the bpf generated code is marked read only,
> as other kernel text sections.
>
> It also splits the unused space (we vmalloc() and only use a fraction of
> the page) in two parts, so that the generated bpf code not starts at a
> known offset in the page, but a pseudo random one.
>
> Refs:
> http://mainisusuallyafunction.blogspot.com/2012/11/attacking-hardened-linux-systems-with.html
>
> Reported-by: H. Peter Anvin <hpa@zytor.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Great work !

Probably other archs could later on follow-up with setting to read-only, too.

Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller May 20, 2013, 6:55 a.m. UTC | #2
From: Daniel Borkmann <dborkman@redhat.com>
Date: Sun, 19 May 2013 19:02:46 +0200

> Probably other archs could later on follow-up with setting to
> read-only, too.

Only s390 and x86 support this facility.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller May 20, 2013, 6:56 a.m. UTC | #3
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 17 May 2013 19:37:03 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> hpa bringed into my attention some security related issues
> with BPF JIT on x86.
> 
> This patch makes sure the bpf generated code is marked read only,
> as other kernel text sections.
> 
> It also splits the unused space (we vmalloc() and only use a fraction of
> the page) in two parts, so that the generated bpf code not starts at a
> known offset in the page, but a pseudo random one.
> 
> Refs:
> http://mainisusuallyafunction.blogspot.com/2012/11/attacking-hardened-linux-systems-with.html
> 
> Reported-by: H. Peter Anvin <hpa@zytor.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Laight May 20, 2013, 8:51 a.m. UTC | #4
> hpa bringed into my attention some security related issues

> with BPF JIT on x86.

> 

> This patch makes sure the bpf generated code is marked read only,

> as other kernel text sections.

> 

> It also splits the unused space (we vmalloc() and only use a fraction of

> the page) in two parts, so that the generated bpf code not starts at a

> known offset in the page, but a pseudo random one.

...
> +static struct bpf_binary_header *bpf_alloc_binary(unsigned int proglen,

> +						  u8 **image_ptr)

...
> +	/* insert a random number of int3 instructions before BPF code */

> +	*image_ptr = &header->image[prandom_u32() % hole];

> +	return header;

> +}


Hmmm.... anyone looking to overwrite kernel code will then start
looking for blocks of 0xcc bytes and know that what follows
is the beginning of a function.
That isn't any harder than random writes.

Copying a random part of .rodata might be better - especially
if you can find part of .rodata.str*.

	David
Daniel Borkmann May 20, 2013, 9:50 a.m. UTC | #5
On 05/20/2013 10:51 AM, David Laight wrote:
>> hpa bringed into my attention some security related issues
>> with BPF JIT on x86.
>>
>> This patch makes sure the bpf generated code is marked read only,
>> as other kernel text sections.
>>
>> It also splits the unused space (we vmalloc() and only use a fraction of
>> the page) in two parts, so that the generated bpf code not starts at a
>> known offset in the page, but a pseudo random one.
> ...
>> +static struct bpf_binary_header *bpf_alloc_binary(unsigned int proglen,
>> +						  u8 **image_ptr)
> ...
>> +	/* insert a random number of int3 instructions before BPF code */
>> +	*image_ptr = &header->image[prandom_u32() % hole];
>> +	return header;
>> +}
>
> Hmmm.... anyone looking to overwrite kernel code will then start
> looking for blocks of 0xcc bytes and know that what follows
> is the beginning of a function.
> That isn't any harder than random writes.
>
> Copying a random part of .rodata might be better - especially
> if you can find part of .rodata.str*.

Here seems also to be another approach ...

   http://grsecurity.net/~spender/jit_prot.diff

via: http://www.reddit.com/r/netsec/comments/13dzhx/linux_kernel_jit_spray_for_smep_kernexec_bypass/
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet May 20, 2013, 1:34 p.m. UTC | #6
On Mon, 2013-05-20 at 09:51 +0100, David Laight wrote:

> Hmmm.... anyone looking to overwrite kernel code will then start
> looking for blocks of 0xcc bytes and know that what follows
> is the beginning of a function.
> That isn't any harder than random writes.
> 
> Copying a random part of .rodata might be better - especially
> if you can find part of .rodata.str*.

That's not the point. We want to catch jumps to before/after the code.

An attacker having full access to kernel code in read and write mode has
full power anyway to do whatever he wants.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet May 20, 2013, 1:52 p.m. UTC | #7
On Mon, 2013-05-20 at 11:50 +0200, Daniel Borkmann wrote:

> Here seems also to be another approach ...
> 
>    http://grsecurity.net/~spender/jit_prot.diff
> 
> via: http://www.reddit.com/r/netsec/comments/13dzhx/linux_kernel_jit_spray_for_smep_kernexec_bypass/


Well, there are many approaches, and I have another one as well provided
by H. Peter Anvin.

Idea was to allow the code being relocated outside of the 2GB space that
we use for kernel code (including module_alloc())

So every call helper, coded in "e8 xx xx xx xx" was replaced by

"48 c7 c0 yy yy yy yy   mov $foo,%rax"
"ff d0          call *%rax"

The RO protection + random holes idea was a solution with no performance
impact.

Another idea is to limit BPF JIT to root users. I do not think BPF JIT
is mandatory at all, as tcpdump is already restricted.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Florian Westphal May 20, 2013, 2:19 p.m. UTC | #8
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> hpa bringed into my attention some security related issues
> with BPF JIT on x86.
> 
> This patch makes sure the bpf generated code is marked read only,
> as other kernel text sections.
> 
> It also splits the unused space (we vmalloc() and only use a fraction of
> the page) in two parts, so that the generated bpf code not starts at a
> known offset in the page, but a pseudo random one.
> 
> Refs:
> http://mainisusuallyafunction.blogspot.com/2012/11/attacking-hardened-linux-systems-with.html

What about emitting additional instructions at random locations in the
generated code itself?

Eg., after every instruction, have random chance to insert
'xor $0xcc,%al; xor $0xcc,%al', etc?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet May 20, 2013, 2:26 p.m. UTC | #9
On Mon, 2013-05-20 at 16:19 +0200, Florian Westphal wrote:

> What about emitting additional instructions at random locations in the
> generated code itself?
> 
> Eg., after every instruction, have random chance to insert
> 'xor $0xcc,%al; xor $0xcc,%al', etc?

This will be the latest thing I'll do.

Frankly, whole point of BPF JIT is speed.

If we have slow code, just use the interpretor instead.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Laight May 20, 2013, 2:35 p.m. UTC | #10
> > What about emitting additional instructions at random locations in the

> > generated code itself?

> >

> > Eg., after every instruction, have random chance to insert

> > 'xor $0xcc,%al; xor $0xcc,%al', etc?

> 

> This will be the latest thing I'll do.

> 

> Frankly, whole point of BPF JIT is speed.

> 

> If we have slow code, just use the interpretor instead.


Adding one of the standard nop opcodes wouldn't be too bad.
IIRC 0x90 is skipped very early on by modern cpu.
Adding one after every nth (or n-mth) instruction would
probably break the alternate instruction stream.

However the attacker could (probably) keep installing
code patterns until the guess pattern matched.

Also the code size changes might make the JIT compile fail
- maybe because of branch offsets, or just size.

	David
Jarkko Sakkinen May 24, 2013, 5:23 p.m. UTC | #11
Hi Eric,

Peter talked to me about this BPF work to prevent JIT spraying attacks
in the beginning of this week and I took a look at your patch.

Some comments:

* Meta-comment about patch structure: why this was a one patch and not
   two patches? It changes two things that are orthogonal to each other
   (random offset,  RW -> RO change).
* Should NX bit be turned on while JIT code is being prepared?
* How hard it would be to read value of bpf_func pointer? If attacker
   is able to read that, it would compromise the whole randomization
   scheme.
* I loved the socket creation trick in the blog post :) Are there any
   plans to do something about it?
* How was minimum entropy of 128 bytes chose? The patch description
   does not explain this in anyway although it seems like decent choice.

/Jarkko

On 17.05.2013 19:37, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> hpa bringed into my attention some security related issues
> with BPF JIT on x86.
>
> This patch makes sure the bpf generated code is marked read only,
> as other kernel text sections.
>
> It also splits the unused space (we vmalloc() and only use a fraction of
> the page) in two parts, so that the generated bpf code not starts at a
> known offset in the page, but a pseudo random one.
>
> Refs:
> http://mainisusuallyafunction.blogspot.com/2012/11/attacking-hardened-linux-systems-with.html
>
> Reported-by: H. Peter Anvin <hpa@zytor.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>   arch/x86/net/bpf_jit_comp.c |   53 ++++++++++++++++++++++++++++++----
>   1 file changed, 47 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index c0212db..79c216a 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -12,6 +12,7 @@
>   #include <linux/netdevice.h>
>   #include <linux/filter.h>
>   #include <linux/if_vlan.h>
> +#include <linux/random.h>
>   
>   /*
>    * Conventions :
> @@ -144,6 +145,39 @@ static int pkt_type_offset(void)
>   	return -1;
>   }
>   
> +struct bpf_binary_header {
> +	unsigned int	pages;
> +	/* Note : for security reasons, bpf code will follow a randomly
> +	 * sized amount of int3 instructions
> +	 */
> +	u8		image[];
> +};
> +
> +static struct bpf_binary_header *bpf_alloc_binary(unsigned int proglen,
> +						  u8 **image_ptr)
> +{
> +	unsigned int sz, hole;
> +	struct bpf_binary_header *header;
> +
> +	/* Most of BPF filters are really small,
> +	 * but if some of them fill a page, allow at least
> +	 * 128 extra bytes to insert a random section of int3
> +	 */
> +	sz = round_up(proglen + sizeof(*header) + 128, PAGE_SIZE);
> +	header = module_alloc(sz);
> +	if (!header)
> +		return NULL;
> +
> +	memset(header, 0xcc, sz); /* fill whole space with int3 instructions */
> +
> +	header->pages = sz / PAGE_SIZE;
> +	hole = sz - (proglen + sizeof(*header));
> +
> +	/* insert a random number of int3 instructions before BPF code */
> +	*image_ptr = &header->image[prandom_u32() % hole];
> +	return header;
> +}
> +
>   void bpf_jit_compile(struct sk_filter *fp)
>   {
>   	u8 temp[64];
> @@ -153,6 +187,7 @@ void bpf_jit_compile(struct sk_filter *fp)
>   	int t_offset, f_offset;
>   	u8 t_op, f_op, seen = 0, pass;
>   	u8 *image = NULL;
> +	struct bpf_binary_header *header = NULL;
>   	u8 *func;
>   	int pc_ret0 = -1; /* bpf index of first RET #0 instruction (if any) */
>   	unsigned int cleanup_addr; /* epilogue code offset */
> @@ -693,7 +728,7 @@ cond_branch:			f_offset = addrs[i + filter[i].jf] - addrs[i];
>   				if (unlikely(proglen + ilen > oldproglen)) {
>   					pr_err("bpb_jit_compile fatal error\n");
>   					kfree(addrs);
> -					module_free(NULL, image);
> +					module_free(NULL, header);
>   					return;
>   				}
>   				memcpy(image + proglen, temp, ilen);
> @@ -717,8 +752,8 @@ cond_branch:			f_offset = addrs[i + filter[i].jf] - addrs[i];
>   			break;
>   		}
>   		if (proglen == oldproglen) {
> -			image = module_alloc(proglen);
> -			if (!image)
> +			header = bpf_alloc_binary(proglen, &image);
> +			if (!header)
>   				goto out;
>   		}
>   		oldproglen = proglen;
> @@ -728,7 +763,8 @@ cond_branch:			f_offset = addrs[i + filter[i].jf] - addrs[i];
>   		bpf_jit_dump(flen, proglen, pass, image);
>   
>   	if (image) {
> -		bpf_flush_icache(image, image + proglen);
> +		bpf_flush_icache(header, image + proglen);
> +		set_memory_ro((unsigned long)header, header->pages);
>   		fp->bpf_func = (void *)image;
>   	}
>   out:
> @@ -738,6 +774,11 @@ out:
>   
>   void bpf_jit_free(struct sk_filter *fp)
>   {
> -	if (fp->bpf_func != sk_run_filter)
> -		module_free(NULL, fp->bpf_func);
> +	if (fp->bpf_func != sk_run_filter) {
> +		unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK;
> +		struct bpf_binary_header *header = (void *)addr;
> +
> +		set_memory_rw(addr, header->pages);
> +		module_free(NULL, header);
> +	}
>   }
>
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index c0212db..79c216a 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -12,6 +12,7 @@ 
 #include <linux/netdevice.h>
 #include <linux/filter.h>
 #include <linux/if_vlan.h>
+#include <linux/random.h>
 
 /*
  * Conventions :
@@ -144,6 +145,39 @@  static int pkt_type_offset(void)
 	return -1;
 }
 
+struct bpf_binary_header {
+	unsigned int	pages;
+	/* Note : for security reasons, bpf code will follow a randomly
+	 * sized amount of int3 instructions
+	 */
+	u8		image[];
+};
+
+static struct bpf_binary_header *bpf_alloc_binary(unsigned int proglen,
+						  u8 **image_ptr)
+{
+	unsigned int sz, hole;
+	struct bpf_binary_header *header;
+
+	/* Most of BPF filters are really small,
+	 * but if some of them fill a page, allow at least
+	 * 128 extra bytes to insert a random section of int3
+	 */
+	sz = round_up(proglen + sizeof(*header) + 128, PAGE_SIZE);
+	header = module_alloc(sz);
+	if (!header)
+		return NULL;
+
+	memset(header, 0xcc, sz); /* fill whole space with int3 instructions */
+
+	header->pages = sz / PAGE_SIZE;
+	hole = sz - (proglen + sizeof(*header));
+
+	/* insert a random number of int3 instructions before BPF code */
+	*image_ptr = &header->image[prandom_u32() % hole];
+	return header;
+}
+
 void bpf_jit_compile(struct sk_filter *fp)
 {
 	u8 temp[64];
@@ -153,6 +187,7 @@  void bpf_jit_compile(struct sk_filter *fp)
 	int t_offset, f_offset;
 	u8 t_op, f_op, seen = 0, pass;
 	u8 *image = NULL;
+	struct bpf_binary_header *header = NULL;
 	u8 *func;
 	int pc_ret0 = -1; /* bpf index of first RET #0 instruction (if any) */
 	unsigned int cleanup_addr; /* epilogue code offset */
@@ -693,7 +728,7 @@  cond_branch:			f_offset = addrs[i + filter[i].jf] - addrs[i];
 				if (unlikely(proglen + ilen > oldproglen)) {
 					pr_err("bpb_jit_compile fatal error\n");
 					kfree(addrs);
-					module_free(NULL, image);
+					module_free(NULL, header);
 					return;
 				}
 				memcpy(image + proglen, temp, ilen);
@@ -717,8 +752,8 @@  cond_branch:			f_offset = addrs[i + filter[i].jf] - addrs[i];
 			break;
 		}
 		if (proglen == oldproglen) {
-			image = module_alloc(proglen);
-			if (!image)
+			header = bpf_alloc_binary(proglen, &image);
+			if (!header)
 				goto out;
 		}
 		oldproglen = proglen;
@@ -728,7 +763,8 @@  cond_branch:			f_offset = addrs[i + filter[i].jf] - addrs[i];
 		bpf_jit_dump(flen, proglen, pass, image);
 
 	if (image) {
-		bpf_flush_icache(image, image + proglen);
+		bpf_flush_icache(header, image + proglen);
+		set_memory_ro((unsigned long)header, header->pages);
 		fp->bpf_func = (void *)image;
 	}
 out:
@@ -738,6 +774,11 @@  out:
 
 void bpf_jit_free(struct sk_filter *fp)
 {
-	if (fp->bpf_func != sk_run_filter)
-		module_free(NULL, fp->bpf_func);
+	if (fp->bpf_func != sk_run_filter) {
+		unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK;
+		struct bpf_binary_header *header = (void *)addr;
+
+		set_memory_rw(addr, header->pages);
+		module_free(NULL, header);
+	}
 }