mbox series

[RFC,bpf-next,0/4] bpf: Add support for XDP programs in DEVMAPs

Message ID 20200522010526.14649-1-dsahern@kernel.org
Headers show
Series bpf: Add support for XDP programs in DEVMAPs | expand

Message

David Ahern May 22, 2020, 1:05 a.m. UTC
Implementation of Daniel's proposal for allowing DEVMAP entries to be
a device index, program id pair. Daniel suggested an fd to specify the
program, but that seems odd to me that you insert the value as an fd, but
read it back as an id since the fd can be closed.

David Ahern (4):
  bpf: Handle 8-byte values in DEVMAP and DEVMAP_HASH
  bpf: Add support to attach bpf program to a devmap
  xdp: Add xdp_txq_info to xdp_buff
  bpftool: Add SEC name for xdp programs attached to device map

 include/linux/bpf.h            |   5 ++
 include/net/xdp.h              |   5 ++
 include/uapi/linux/bpf.h       |   3 +
 kernel/bpf/devmap.c            | 132 +++++++++++++++++++++++++++++----
 net/core/dev.c                 |  18 +++++
 net/core/filter.c              |  17 +++++
 tools/include/uapi/linux/bpf.h |   3 +
 tools/lib/bpf/libbpf.c         |   2 +
 8 files changed, 170 insertions(+), 15 deletions(-)

Comments

Jesper Dangaard Brouer May 22, 2020, 11:17 a.m. UTC | #1
On Thu, 21 May 2020 19:05:22 -0600
David Ahern <dsahern@kernel.org> wrote:

> Implementation of Daniel's proposal for allowing DEVMAP entries to be
> a device index, program id pair. Daniel suggested an fd to specify the
> program, but that seems odd to me that you insert the value as an fd, but
> read it back as an id since the fd can be closed.

Great that you are working on this :-)

Lorenzo (cc lorenzo@kernel.org) is working on a similar approach for
cpumap, please coordinate/Cc him on these patches.
Toke Høiland-Jørgensen May 22, 2020, 3:59 p.m. UTC | #2
David Ahern <dsahern@kernel.org> writes:

> Implementation of Daniel's proposal for allowing DEVMAP entries to be
> a device index, program id pair. Daniel suggested an fd to specify the
> program, but that seems odd to me that you insert the value as an fd, but
> read it back as an id since the fd can be closed.

While I can be sympathetic to the argument that it seems odd, every
other API uses FD for insert and returns ID, so why make it different
here? Also, the choice has privilege implications, since the CAP_BPF
series explicitly makes going from ID->FD a more privileged operation
than just querying the ID.

-Toke
David Ahern May 22, 2020, 5:46 p.m. UTC | #3
On 5/22/20 9:59 AM, Toke Høiland-Jørgensen wrote:
> David Ahern <dsahern@kernel.org> writes:
> 
>> Implementation of Daniel's proposal for allowing DEVMAP entries to be
>> a device index, program id pair. Daniel suggested an fd to specify the
>> program, but that seems odd to me that you insert the value as an fd, but
>> read it back as an id since the fd can be closed.
> 
> While I can be sympathetic to the argument that it seems odd, every
> other API uses FD for insert and returns ID, so why make it different
> here? Also, the choice has privilege implications, since the CAP_BPF
> series explicitly makes going from ID->FD a more privileged operation
> than just querying the ID.
> 

I do not like the model where the kernel changes the value the user
pushed down.
Toke Høiland-Jørgensen May 25, 2020, 12:15 p.m. UTC | #4
David Ahern <dsahern@gmail.com> writes:

> On 5/22/20 9:59 AM, Toke Høiland-Jørgensen wrote:
>> David Ahern <dsahern@kernel.org> writes:
>> 
>>> Implementation of Daniel's proposal for allowing DEVMAP entries to be
>>> a device index, program id pair. Daniel suggested an fd to specify the
>>> program, but that seems odd to me that you insert the value as an fd, but
>>> read it back as an id since the fd can be closed.
>> 
>> While I can be sympathetic to the argument that it seems odd, every
>> other API uses FD for insert and returns ID, so why make it different
>> here? Also, the choice has privilege implications, since the CAP_BPF
>> series explicitly makes going from ID->FD a more privileged operation
>> than just querying the ID.
>> 
>
> I do not like the model where the kernel changes the value the user
> pushed down.

Yet it's what we do in every other interface where a user needs to
supply a program, including in prog array maps. So let's not create a
new inconsistent interface here...

-Toke
Jesper Dangaard Brouer May 25, 2020, 12:47 p.m. UTC | #5
On Mon, 25 May 2020 14:15:32 +0200
Toke Høiland-Jørgensen <toke@redhat.com> wrote:

> David Ahern <dsahern@gmail.com> writes:
> 
> > On 5/22/20 9:59 AM, Toke Høiland-Jørgensen wrote:  
> >> David Ahern <dsahern@kernel.org> writes:
> >>   
> >>> Implementation of Daniel's proposal for allowing DEVMAP entries to be
> >>> a device index, program id pair. Daniel suggested an fd to specify the
> >>> program, but that seems odd to me that you insert the value as an fd, but
> >>> read it back as an id since the fd can be closed.  
> >> 
> >> While I can be sympathetic to the argument that it seems odd, every
> >> other API uses FD for insert and returns ID, so why make it different
> >> here? Also, the choice has privilege implications, since the CAP_BPF
> >> series explicitly makes going from ID->FD a more privileged operation
> >> than just querying the ID.

Sorry, I don't follow.
Can someone explain why is inserting an ID is a privilege problem?

   
> >
> > I do not like the model where the kernel changes the value the user
> > pushed down.  
> 
> Yet it's what we do in every other interface where a user needs to
> supply a program, including in prog array maps. So let's not create a
> new inconsistent interface here...

I sympathize with Ahern on this.  It seems very weird to insert/write
one value-type, but read another value-type.
Toke Høiland-Jørgensen May 25, 2020, 12:56 p.m. UTC | #6
Jesper Dangaard Brouer <brouer@redhat.com> writes:

> On Mon, 25 May 2020 14:15:32 +0200
> Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
>> David Ahern <dsahern@gmail.com> writes:
>> 
>> > On 5/22/20 9:59 AM, Toke Høiland-Jørgensen wrote:  
>> >> David Ahern <dsahern@kernel.org> writes:
>> >>   
>> >>> Implementation of Daniel's proposal for allowing DEVMAP entries to be
>> >>> a device index, program id pair. Daniel suggested an fd to specify the
>> >>> program, but that seems odd to me that you insert the value as an fd, but
>> >>> read it back as an id since the fd can be closed.  
>> >> 
>> >> While I can be sympathetic to the argument that it seems odd, every
>> >> other API uses FD for insert and returns ID, so why make it different
>> >> here? Also, the choice has privilege implications, since the CAP_BPF
>> >> series explicitly makes going from ID->FD a more privileged operation
>> >> than just querying the ID.
>
> Sorry, I don't follow.
> Can someone explain why is inserting an ID is a privilege problem?

See description here:
https://lore.kernel.org/bpf/20200513230355.7858-1-alexei.starovoitov@gmail.com/

Specifically, this part:

> Consolidating all CAP checks at load time makes security model similar to
> open() syscall. Once the user got an FD it can do everything with it.
> read/write/poll don't check permissions. The same way when bpf_prog_load
> command returns an FD the user can do everything (including attaching,
> detaching, and bpf_test_run).
> 
> The important design decision is to allow ID->FD transition for
> CAP_SYS_ADMIN only. What it means that user processes can run
> with CAP_BPF and CAP_NET_ADMIN and they will not be able to affect each
> other unless they pass FDs via scm_rights or via pinning in bpffs.


>> > I do not like the model where the kernel changes the value the user
>> > pushed down.  
>> 
>> Yet it's what we do in every other interface where a user needs to
>> supply a program, including in prog array maps. So let's not create a
>> new inconsistent interface here...
>
> I sympathize with Ahern on this.  It seems very weird to insert/write
> one value-type, but read another value-type.

Yeah, I do kinda agree that it's a bit weird. But it's what we do
everywhere else, so I think consistency wins out here. There might be an
argument that maps should be different (because they're conceptually a
read/write data structure not a syscall return value). But again, we
already have a map type that takes prog IDs, and that already does the
rewriting, so doing it different here would be even weirder...

-Toke
Daniel Borkmann May 26, 2020, 11:36 p.m. UTC | #7
On 5/25/20 2:56 PM, Toke Høiland-Jørgensen wrote:
> Jesper Dangaard Brouer <brouer@redhat.com> writes:
>> On Mon, 25 May 2020 14:15:32 +0200
>> Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>> David Ahern <dsahern@gmail.com> writes:
>>>> On 5/22/20 9:59 AM, Toke Høiland-Jørgensen wrote:
>>>>> David Ahern <dsahern@kernel.org> writes:
>>>>>    
>>>>>> Implementation of Daniel's proposal for allowing DEVMAP entries to be
>>>>>> a device index, program id pair. Daniel suggested an fd to specify the
>>>>>> program, but that seems odd to me that you insert the value as an fd, but
>>>>>> read it back as an id since the fd can be closed.
>>>>>
>>>>> While I can be sympathetic to the argument that it seems odd, every
>>>>> other API uses FD for insert and returns ID, so why make it different
>>>>> here? Also, the choice has privilege implications, since the CAP_BPF
>>>>> series explicitly makes going from ID->FD a more privileged operation
>>>>> than just querying the ID.

[...]

>> I sympathize with Ahern on this.  It seems very weird to insert/write
>> one value-type, but read another value-type.
> 
> Yeah, I do kinda agree that it's a bit weird. But it's what we do
> everywhere else, so I think consistency wins out here. There might be an
> argument that maps should be different (because they're conceptually a
> read/write data structure not a syscall return value). But again, we
> already have a map type that takes prog IDs, and that already does the
> rewriting, so doing it different here would be even weirder...

Sorry for the late reply. Agree, it would at least be consistent to what is done
in tail call maps, and the XDP netlink API where you have the fd->id in both cases.
Either way, quick glance over the patches, the direction of this RFC looks good to
me, better fit than the prior XDP egress approaches.

Thanks,
Daniel