diff mbox

[net] net: filter: x86: fix JIT address randomization

Message ID 1400007214-3236-1-git-send-email-ast@plumgrid.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Alexei Starovoitov May 13, 2014, 6:53 p.m. UTC
bpf_alloc_binary() adds 128 bytes of room to JITed program image
and rounds it up to the nearest page size. If image size is close
to page size (like 4000), it is rounded to two pages:
round_up(4000 + 4 + 128) == 8192
then 'hole' is computed as 8192 - (4000 + 4) = 4188
If prandom_u32() % hole selects a number >= 4096, then kernel will crash
during bpf_jit_free():

kernel BUG at arch/x86/mm/pageattr.c:887!
Call Trace:
 [<ffffffff81037285>] change_page_attr_set_clr+0x135/0x460
 [<ffffffff81694cc0>] ? _raw_spin_unlock_irq+0x30/0x50
 [<ffffffff810378ff>] set_memory_rw+0x2f/0x40
 [<ffffffffa01a0d8d>] bpf_jit_free_deferred+0x2d/0x60
 [<ffffffff8106bf98>] process_one_work+0x1d8/0x6a0
 [<ffffffff8106bf38>] ? process_one_work+0x178/0x6a0
 [<ffffffff8106c90c>] worker_thread+0x11c/0x370

since bpf_jit_free() does:
  unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK;
  struct bpf_binary_header *header = (void *)addr;
to compute start address of 'bpf_binary_header'
and header->pages will pass junk to:
  set_memory_rw(addr, header->pages);

Fix it by picking image offset always out of the first page.

While at it make the offset to be within first half of the page,
so there is some room for CPU to run before it hits page miss.

Fixes: 314beb9bcabfd ("x86: bpf_jit_comp: secure bpf jit against spraying attacks")
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---

s390 commit aa2d2c73c21f ("s390/bpf,jit: address randomize and write protect jit code")
seems to have the same problem

 arch/x86/net/bpf_jit_comp.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Eric Dumazet May 13, 2014, 8:23 p.m. UTC | #1
On Tue, 2014-05-13 at 11:53 -0700, Alexei Starovoitov wrote:
> bpf_alloc_binary() adds 128 bytes of room to JITed program image
> and rounds it up to the nearest page size. If image size is close
> to page size (like 4000), it is rounded to two pages:
> round_up(4000 + 4 + 128) == 8192
> then 'hole' is computed as 8192 - (4000 + 4) = 4188
> If prandom_u32() % hole selects a number >= 4096, then kernel will crash
> during bpf_jit_free():
> 
> kernel BUG at arch/x86/mm/pageattr.c:887!
> Call Trace:
>  [<ffffffff81037285>] change_page_attr_set_clr+0x135/0x460
>  [<ffffffff81694cc0>] ? _raw_spin_unlock_irq+0x30/0x50
>  [<ffffffff810378ff>] set_memory_rw+0x2f/0x40
>  [<ffffffffa01a0d8d>] bpf_jit_free_deferred+0x2d/0x60
>  [<ffffffff8106bf98>] process_one_work+0x1d8/0x6a0
>  [<ffffffff8106bf38>] ? process_one_work+0x178/0x6a0
>  [<ffffffff8106c90c>] worker_thread+0x11c/0x370
> 
> since bpf_jit_free() does:
>   unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK;
>   struct bpf_binary_header *header = (void *)addr;
> to compute start address of 'bpf_binary_header'
> and header->pages will pass junk to:
>   set_memory_rw(addr, header->pages);
> 
> Fix it by picking image offset always out of the first page.

You mean, offset should be in first page ?

> 
> While at it make the offset to be within first half of the page,
> so there is some room for CPU to run before it hits page miss.
> 
> Fixes: 314beb9bcabfd ("x86: bpf_jit_comp: secure bpf jit against spraying attacks")
> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
> ---
> 
> s390 commit aa2d2c73c21f ("s390/bpf,jit: address randomize and write protect jit code")
> seems to have the same problem
> 
>  arch/x86/net/bpf_jit_comp.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index dc01773..c6ab7a0 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -171,7 +171,7 @@ static struct bpf_binary_header *bpf_alloc_binary(unsigned int proglen,
>  	memset(header, 0xcc, sz); /* fill whole space with int3 instructions */
>  
>  	header->pages = sz / PAGE_SIZE;
> -	hole = sz - (proglen + sizeof(*header));
> +	hole = min(sz - (proglen + sizeof(*header)), PAGE_SIZE / 2);
>  
>  	/* insert a random number of int3 instructions before BPF code */
>  	*image_ptr = &header->image[prandom_u32() % hole];

Good catch, but I am not sure about the PAGE_SIZE / 2

The argument of not having code ending on (or being very close of) page
boundary seems orthogonal to this bug fix.

Thanks


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov May 13, 2014, 8:34 p.m. UTC | #2
On Tue, May 13, 2014 at 1:23 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2014-05-13 at 11:53 -0700, Alexei Starovoitov wrote:
>> bpf_alloc_binary() adds 128 bytes of room to JITed program image
>> and rounds it up to the nearest page size. If image size is close
>> to page size (like 4000), it is rounded to two pages:
>> round_up(4000 + 4 + 128) == 8192
>> then 'hole' is computed as 8192 - (4000 + 4) = 4188
>> If prandom_u32() % hole selects a number >= 4096, then kernel will crash
>> during bpf_jit_free():
>>
>> kernel BUG at arch/x86/mm/pageattr.c:887!
>> Call Trace:
>>  [<ffffffff81037285>] change_page_attr_set_clr+0x135/0x460
>>  [<ffffffff81694cc0>] ? _raw_spin_unlock_irq+0x30/0x50
>>  [<ffffffff810378ff>] set_memory_rw+0x2f/0x40
>>  [<ffffffffa01a0d8d>] bpf_jit_free_deferred+0x2d/0x60
>>  [<ffffffff8106bf98>] process_one_work+0x1d8/0x6a0
>>  [<ffffffff8106bf38>] ? process_one_work+0x178/0x6a0
>>  [<ffffffff8106c90c>] worker_thread+0x11c/0x370
>>
>> since bpf_jit_free() does:
>>   unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK;
>>   struct bpf_binary_header *header = (void *)addr;
>> to compute start address of 'bpf_binary_header'
>> and header->pages will pass junk to:
>>   set_memory_rw(addr, header->pages);
>>
>> Fix it by picking image offset always out of the first page.
>
> You mean, offset should be in first page ?

yes.
&header->image[prandom_u32() % hole]
should in the same page as
&header

>>
>> While at it make the offset to be within first half of the page,
>> so there is some room for CPU to run before it hits page miss.
>>
>> Fixes: 314beb9bcabfd ("x86: bpf_jit_comp: secure bpf jit against spraying attacks")
>> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
>> ---
>>
>> s390 commit aa2d2c73c21f ("s390/bpf,jit: address randomize and write protect jit code")
>> seems to have the same problem
>>
>>  arch/x86/net/bpf_jit_comp.c |    2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
>> index dc01773..c6ab7a0 100644
>> --- a/arch/x86/net/bpf_jit_comp.c
>> +++ b/arch/x86/net/bpf_jit_comp.c
>> @@ -171,7 +171,7 @@ static struct bpf_binary_header *bpf_alloc_binary(unsigned int proglen,
>>       memset(header, 0xcc, sz); /* fill whole space with int3 instructions */
>>
>>       header->pages = sz / PAGE_SIZE;
>> -     hole = sz - (proglen + sizeof(*header));
>> +     hole = min(sz - (proglen + sizeof(*header)), PAGE_SIZE / 2);
>>
>>       /* insert a random number of int3 instructions before BPF code */
>>       *image_ptr = &header->image[prandom_u32() % hole];
>
> Good catch, but I am not sure about the PAGE_SIZE / 2
>
> The argument of not having code ending on (or being very close of) page
> boundary seems orthogonal to this bug fix.

Gotta pick some number... page/2 seems good enough to have
large range for prandom() to choose and better performance.
Another alternative is to do min(…, PAGE_SIZE - sizeof(*header)),
but that is harder to understand.

Also just realized that I miscalculated the breaking point:
"If prandom_u32() % hole selects a number >= 4096, then kernel will crash"
it should read: "… >= 4092 ..."
since sizeof(*header) needs to be accounted for.

> Thanks
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
H. Peter Anvin May 13, 2014, 9:28 p.m. UTC | #3
On 05/13/2014 01:34 PM, Alexei Starovoitov wrote:
>>
>> The argument of not having code ending on (or being very close of) page
>> boundary seems orthogonal to this bug fix.
> 
> Gotta pick some number... page/2 seems good enough to have
> large range for prandom() to choose and better performance.
> Another alternative is to do min(…, PAGE_SIZE - sizeof(*header)),
> but that is harder to understand.
> 

The latter is correct by construction, and thus doesn't end up with the
question "what is going on here" or has hidden failure conditions.

> Also just realized that I miscalculated the breaking point:
> "If prandom_u32() % hole selects a number >= 4096, then kernel will crash"
> it should read: "… >= 4092 ..."
> since sizeof(*header) needs to be accounted for.

No, it should read PAGE_SIZE - sizeof(*header) if anything.

	-hpa


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller May 13, 2014, 9:38 p.m. UTC | #4
From: "H. Peter Anvin" <hpa@zytor.com>
Date: Tue, 13 May 2014 14:28:55 -0700

> On 05/13/2014 01:34 PM, Alexei Starovoitov wrote:
>>>
>>> The argument of not having code ending on (or being very close of) page
>>> boundary seems orthogonal to this bug fix.
>> 
>> Gotta pick some number... page/2 seems good enough to have
>> large range for prandom() to choose and better performance.
>> Another alternative is to do min(…, PAGE_SIZE - sizeof(*header)),
>> but that is harder to understand.
>> 
> 
> The latter is correct by construction, and thus doesn't end up with the
> question "what is going on here" or has hidden failure conditions.

Agreed.

>> Also just realized that I miscalculated the breaking point:
>> "If prandom_u32() % hole selects a number >= 4096, then kernel will crash"
>> it should read: "… >= 4092 ..."
>> since sizeof(*header) needs to be accounted for.
> 
> No, it should read PAGE_SIZE - sizeof(*header) if anything.

Also agreed.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Heiko Carstens May 14, 2014, 7:36 a.m. UTC | #5
On Tue, May 13, 2014 at 11:53:34AM -0700, Alexei Starovoitov wrote:
> bpf_alloc_binary() adds 128 bytes of room to JITed program image
> and rounds it up to the nearest page size. If image size is close
> to page size (like 4000), it is rounded to two pages:
> round_up(4000 + 4 + 128) == 8192
> then 'hole' is computed as 8192 - (4000 + 4) = 4188
> If prandom_u32() % hole selects a number >= 4096, then kernel will crash
> during bpf_jit_free():

[...]

> Fixes: 314beb9bcabfd ("x86: bpf_jit_comp: secure bpf jit against spraying attacks")
> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
> ---
> 
> s390 commit aa2d2c73c21f ("s390/bpf,jit: address randomize and write protect jit code")
> seems to have the same problem

Yes, that's the same bug on s390. Would you mind fixing s390 as well, since I
assume you're going to send a new patch for x86?

Would be good to keep the code quite identical so these issues can be easily
seen across architectures.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index dc01773..c6ab7a0 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -171,7 +171,7 @@  static struct bpf_binary_header *bpf_alloc_binary(unsigned int proglen,
 	memset(header, 0xcc, sz); /* fill whole space with int3 instructions */
 
 	header->pages = sz / PAGE_SIZE;
-	hole = sz - (proglen + sizeof(*header));
+	hole = min(sz - (proglen + sizeof(*header)), PAGE_SIZE / 2);
 
 	/* insert a random number of int3 instructions before BPF code */
 	*image_ptr = &header->image[prandom_u32() % hole];