mbox series

[net,0/3] Fix for BPF devmap percpu allocation splat

Message ID cover.1508251210.git.daniel@iogearbox.net
Headers show
Series Fix for BPF devmap percpu allocation splat | expand

Message

Daniel Borkmann Oct. 17, 2017, 2:55 p.m. UTC
The set fixes a splat in devmap percpu allocation when we alloc
the flush bitmap. Patch 1 is a prerequisite for the fix in patch 2,
patch 1 is rather small, so if this could be routed via -net, for
example, with Tejun's Ack that would be good. Patch 3 gets rid of
remaining PCPU_MIN_UNIT_SIZE checks, which are percpu allocator
internals and should not be used.

Thanks!

Daniel Borkmann (3):
  mm, percpu: add support for __GFP_NOWARN flag
  bpf: fix splat for illegal devmap percpu allocation
  bpf: do not test for PCPU_MIN_UNIT_SIZE before percpu allocations

 kernel/bpf/arraymap.c |  2 +-
 kernel/bpf/devmap.c   |  5 +++--
 kernel/bpf/hashtab.c  |  4 ----
 mm/percpu.c           | 15 ++++++++++-----
 4 files changed, 14 insertions(+), 12 deletions(-)

Comments

David Laight Oct. 17, 2017, 3:03 p.m. UTC | #1
From: Daniel Borkmann
> Sent: 17 October 2017 15:56
> 
> The set fixes a splat in devmap percpu allocation when we alloc
> the flush bitmap. Patch 1 is a prerequisite for the fix in patch 2,
> patch 1 is rather small, so if this could be routed via -net, for
> example, with Tejun's Ack that would be good. Patch 3 gets rid of
> remaining PCPU_MIN_UNIT_SIZE checks, which are percpu allocator
> internals and should not be used.

Does it make sense to allow the user program to try to allocate ever
smaller very large maps until it finds one that succeeds - thus
using up all the percpu space?

Or is this a 'root only' 'shoot self in foot' job?

	David
Daniel Borkmann Oct. 17, 2017, 3:11 p.m. UTC | #2
On 10/17/2017 05:03 PM, David Laight wrote:
> From: Daniel Borkmann
>> Sent: 17 October 2017 15:56
>>
>> The set fixes a splat in devmap percpu allocation when we alloc
>> the flush bitmap. Patch 1 is a prerequisite for the fix in patch 2,
>> patch 1 is rather small, so if this could be routed via -net, for
>> example, with Tejun's Ack that would be good. Patch 3 gets rid of
>> remaining PCPU_MIN_UNIT_SIZE checks, which are percpu allocator
>> internals and should not be used.
>
> Does it make sense to allow the user program to try to allocate ever
> smaller very large maps until it finds one that succeeds - thus
> using up all the percpu space?
>
> Or is this a 'root only' 'shoot self in foot' job?

It's root only although John still has a pending fix to be flushed
out for -net first in the next days to actually enforce that cap
(devmap is not in an official kernel yet at this point, so all good),
but apart from this, all map allocs in general are accounted for
as well.

Thanks,
Daniel
Tejun Heo Oct. 18, 2017, 1:25 p.m. UTC | #3
Hello, Daniel.

(cc'ing Dennis)

On Tue, Oct 17, 2017 at 04:55:51PM +0200, Daniel Borkmann wrote:
> The set fixes a splat in devmap percpu allocation when we alloc
> the flush bitmap. Patch 1 is a prerequisite for the fix in patch 2,
> patch 1 is rather small, so if this could be routed via -net, for
> example, with Tejun's Ack that would be good. Patch 3 gets rid of
> remaining PCPU_MIN_UNIT_SIZE checks, which are percpu allocator
> internals and should not be used.
> 
> Thanks!
> 
> Daniel Borkmann (3):
>   mm, percpu: add support for __GFP_NOWARN flag

This looks fine.

>   bpf: fix splat for illegal devmap percpu allocation
>   bpf: do not test for PCPU_MIN_UNIT_SIZE before percpu allocations

These look okay too but if it helps percpu allocator can expose the
maximum size / alignment supported to take out the guessing game too.

Also, the reason why PCPU_MIN_UNIT_SIZE is what it is is because
nobody needed anything bigger.  Increasing the size doesn't really
cost much at least on 64bit archs.  Is that something we want to be
considering?

Thanks.
Daniel Borkmann Oct. 18, 2017, 2:03 p.m. UTC | #4
On 10/18/2017 03:25 PM, Tejun Heo wrote:
> Hello, Daniel.
>
> (cc'ing Dennis)
>
> On Tue, Oct 17, 2017 at 04:55:51PM +0200, Daniel Borkmann wrote:
>> The set fixes a splat in devmap percpu allocation when we alloc
>> the flush bitmap. Patch 1 is a prerequisite for the fix in patch 2,
>> patch 1 is rather small, so if this could be routed via -net, for
>> example, with Tejun's Ack that would be good. Patch 3 gets rid of
>> remaining PCPU_MIN_UNIT_SIZE checks, which are percpu allocator
>> internals and should not be used.
>>
>> Thanks!
>>
>> Daniel Borkmann (3):
>>    mm, percpu: add support for __GFP_NOWARN flag
>
> This looks fine.

Great, thanks!

>>    bpf: fix splat for illegal devmap percpu allocation
>>    bpf: do not test for PCPU_MIN_UNIT_SIZE before percpu allocations
>
> These look okay too but if it helps percpu allocator can expose the
> maximum size / alignment supported to take out the guessing game too.

At least from BPF side there's right now no infra for exposing
max possible alloc sizes for maps to e.g. user space as indication.
There are few users left in the tree, where it would make sense for
having some helpers though:

   arch/tile/kernel/setup.c:729:   if (size < PCPU_MIN_UNIT_SIZE)
   arch/tile/kernel/setup.c:730:           size = PCPU_MIN_UNIT_SIZE;
   drivers/net/ethernet/chelsio/libcxgb/libcxgb_ppm.c:346: unsigned int max = (PCPU_MIN_UNIT_SIZE - sizeof(*pools)) << 3;
   drivers/net/ethernet/chelsio/libcxgb/libcxgb_ppm.c:352: /* make sure per cpu pool fits into PCPU_MIN_UNIT_SIZE */
   drivers/scsi/libfc/fc_exch.c:2488:       /* reduce range so per cpu pool fits into PCPU_MIN_UNIT_SIZE pool */
   drivers/scsi/libfc/fc_exch.c:2489:      pool_exch_range = (PCPU_MIN_UNIT_SIZE - sizeof(*pool)) /

> Also, the reason why PCPU_MIN_UNIT_SIZE is what it is is because
> nobody needed anything bigger.  Increasing the size doesn't really
> cost much at least on 64bit archs.  Is that something we want to be
> considering?

For devmap (and cpumap) itself it wouldn't make sense. For per-cpu
hashtable we could indeed consider it in the future.

Thanks,
Daniel
Daniel Borkmann Oct. 18, 2017, 2:22 p.m. UTC | #5
On 10/18/2017 04:03 PM, Daniel Borkmann wrote:
> On 10/18/2017 03:25 PM, Tejun Heo wrote:
>> Hello, Daniel.
>>
>> (cc'ing Dennis)
>>
>> On Tue, Oct 17, 2017 at 04:55:51PM +0200, Daniel Borkmann wrote:
>>> The set fixes a splat in devmap percpu allocation when we alloc
>>> the flush bitmap. Patch 1 is a prerequisite for the fix in patch 2,
>>> patch 1 is rather small, so if this could be routed via -net, for
>>> example, with Tejun's Ack that would be good. Patch 3 gets rid of
>>> remaining PCPU_MIN_UNIT_SIZE checks, which are percpu allocator
>>> internals and should not be used.
>>>
>>> Thanks!
>>>
>>> Daniel Borkmann (3):
>>>    mm, percpu: add support for __GFP_NOWARN flag
>>
>> This looks fine.
>
> Great, thanks!
>
>>>    bpf: fix splat for illegal devmap percpu allocation
>>>    bpf: do not test for PCPU_MIN_UNIT_SIZE before percpu allocations
>>
>> These look okay too but if it helps percpu allocator can expose the
>> maximum size / alignment supported to take out the guessing game too.
>
> At least from BPF side there's right now no infra for exposing
> max possible alloc sizes for maps to e.g. user space as indication.
> There are few users left in the tree, where it would make sense for
> having some helpers though:
>
>    arch/tile/kernel/setup.c:729:   if (size < PCPU_MIN_UNIT_SIZE)
>    arch/tile/kernel/setup.c:730:           size = PCPU_MIN_UNIT_SIZE;
>    drivers/net/ethernet/chelsio/libcxgb/libcxgb_ppm.c:346: unsigned int max = (PCPU_MIN_UNIT_SIZE - sizeof(*pools)) << 3;
>    drivers/net/ethernet/chelsio/libcxgb/libcxgb_ppm.c:352: /* make sure per cpu pool fits into PCPU_MIN_UNIT_SIZE */
>    drivers/scsi/libfc/fc_exch.c:2488:       /* reduce range so per cpu pool fits into PCPU_MIN_UNIT_SIZE pool */
>    drivers/scsi/libfc/fc_exch.c:2489:      pool_exch_range = (PCPU_MIN_UNIT_SIZE - sizeof(*pool)) /
>
>> Also, the reason why PCPU_MIN_UNIT_SIZE is what it is is because
>> nobody needed anything bigger.  Increasing the size doesn't really
>> cost much at least on 64bit archs.  Is that something we want to be
>> considering?
>
> For devmap (and cpumap) itself it wouldn't make sense. For per-cpu
> hashtable we could indeed consider it in the future.

Higher prio imo would be to make the allocation itself faster
though, I remember we talked about this back in May wrt hashtable,
but I kind of lost track whether there was an update on this in
the mean time. ;-)

Cheers,
Daniel
Alexei Starovoitov Oct. 18, 2017, 3:28 p.m. UTC | #6
On Wed, Oct 18, 2017 at 7:22 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> Higher prio imo would be to make the allocation itself faster
> though, I remember we talked about this back in May wrt hashtable,
> but I kind of lost track whether there was an update on this in
> the mean time. ;-)

new percpu allocator by Dennis fixed those issues. It's in 4.14
Daniel Borkmann Oct. 18, 2017, 3:31 p.m. UTC | #7
On 10/18/2017 05:28 PM, Alexei Starovoitov wrote:
> On Wed, Oct 18, 2017 at 7:22 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>>
>> Higher prio imo would be to make the allocation itself faster
>> though, I remember we talked about this back in May wrt hashtable,
>> but I kind of lost track whether there was an update on this in
>> the mean time. ;-)
>
> new percpu allocator by Dennis fixed those issues. It's in 4.14

Ah, perfect!
Dennis Zhou Oct. 18, 2017, 9:45 p.m. UTC | #8
Hi Daniel and Tejun,

On Wed, Oct 18, 2017 at 06:25:26AM -0700, Tejun Heo wrote:
> > Daniel Borkmann (3):
> >   mm, percpu: add support for __GFP_NOWARN flag
> 
> This looks fine.
> 

Looks good to me too.

> >   bpf: fix splat for illegal devmap percpu allocation
> >   bpf: do not test for PCPU_MIN_UNIT_SIZE before percpu allocations
> 
> These look okay too but if it helps percpu allocator can expose the
> maximum size / alignment supported to take out the guessing game too.
> 

I can add this once we've addressed the below if we want to.

> Also, the reason why PCPU_MIN_UNIT_SIZE is what it is is because
> nobody needed anything bigger.  Increasing the size doesn't really
> cost much at least on 64bit archs.  Is that something we want to be
> considering?
> 

I'm not sure I see the reason we can't match the minimum allocation size
with the unit size? It seems weird to arbitrate the maximum allocation
size given a lower bound on the unit size.

Thanks,
Dennis
David Miller Oct. 19, 2017, 12:14 p.m. UTC | #9
From: Daniel Borkmann <daniel@iogearbox.net>
Date: Tue, 17 Oct 2017 16:55:51 +0200

> The set fixes a splat in devmap percpu allocation when we alloc
> the flush bitmap. Patch 1 is a prerequisite for the fix in patch 2,
> patch 1 is rather small, so if this could be routed via -net, for
> example, with Tejun's Ack that would be good. Patch 3 gets rid of
> remaining PCPU_MIN_UNIT_SIZE checks, which are percpu allocator
> internals and should not be used.

Series applied.
Tejun Heo Oct. 21, 2017, 4 p.m. UTC | #10
Hello,

On Wed, Oct 18, 2017 at 04:45:08PM -0500, Dennis Zhou wrote:
> I'm not sure I see the reason we can't match the minimum allocation size
> with the unit size? It seems weird to arbitrate the maximum allocation
> size given a lower bound on the unit size.

idk, it can be weird for the maximum allowed allocation size varying
widely depending on how the machine boots up.

Thanks.