[RFC,bpf-next,0/5] Convert iproute2 to use libbpf (WIP)
mbox series

Message ID 20190820114706.18546-1-toke@redhat.com
Headers show
Series
  • Convert iproute2 to use libbpf (WIP)
Related show

Message

Toke Høiland-Jørgensen Aug. 20, 2019, 11:47 a.m. UTC
iproute2 uses its own bpf loader to load eBPF programs, which has
evolved separately from libbpf. Since we are now standardising on
libbpf, this becomes a problem as iproute2 is slowly accumulating
feature incompatibilities with libbpf-based loaders. In particular,
iproute2 has its own (expanded) version of the map definition struct,
which makes it difficult to write programs that can be loaded with both
custom loaders and iproute2.

This series seeks to address this by converting iproute2 to using libbpf
for all its bpf needs. This version is an early proof-of-concept RFC, to
get some feedback on whether people think this is the right direction.

What this series does is the following:

- Updates the libbpf map definition struct to match that of iproute2
  (patch 1).
- Adds functionality to libbpf to support automatic pinning of maps when
  loading an eBPF program, while re-using pinned maps if they already
  exist (patches 2-3).
- Modifies iproute2 to make it possible to compile it against libbpf
  without affecting any existing functionality (patch 4).
- Changes the iproute2 eBPF loader to use libbpf for loading XDP
  programs (patch 5).


As this is an early PoC, there are still a few missing pieces before
this can be merged. Including (but probably not limited to):

- Consolidate the map definition struct in the bpf_helpers.h file in the
  kernel tree. This contains a different, and incompatible, update to
  the struct. Since the iproute2 version has actually been released for
  use outside the kernel tree (and thus is subject to API stability
  constraints), I think it makes the most sense to keep that, and port
  the selftests to use it.

- The iproute2 loader supports automatically populating map-in-map
  definitions on load. This needs to be added to libbpf as well. There
  is an implementation of this in the selftests in the kernel tree,
  which will have to be ported (related to the previous point).

- The iproute2 port needs to be completed; this means at least
  supporting TC eBPF programs as well, figuring out how to deal with
  cBPF programs, and getting the verbose output back to the same state
  as before the port. Also, I guess the iproute2 maintainers need to ACK
  that they are good with adding a dependency on libbpf.

- Some of the code added to libbpf in patch 2 in this series include
  code derived from iproute2, which is GPLv2+. So it will need to be
  re-licensed to be usable in libbpf. Since `git blame` indicated that
  the original code was written by Daniel, I figure he can ACK that
  relicensing before applying the patches :)


Please take a look at this series and let me know if you agree
that this is the right direction to go. Assuming there's consensus that
it is, I'll focus on getting the rest of the libbpf patches ready for
merging. I'll send those as a separate series, and hold off on the
iproute2 patches until they are merged; but for this version I'm
including both in one series so it's easier to see the context.

-Toke


libbpf:
Toke Høiland-Jørgensen (3):
  libbpf: Add map definition struct fields from iproute2
  libbpf: Add support for auto-pinning of maps with reuse on program
    load
  libbpf: Add support for specifying map pinning path via callback

 tools/lib/bpf/libbpf.c | 205 +++++++++++++++++++++++++++++++++++++++--
 tools/lib/bpf/libbpf.h |  18 ++++
 2 files changed, 214 insertions(+), 9 deletions(-)

iproute2:
Toke Høiland-Jørgensen (2):
  iproute2: Allow compiling against libbpf
  iproute2: Support loading XDP programs with libbpf

 configure          |  16 +++++
 include/bpf_util.h |   6 +-
 ip/ipvrf.c         |   4 +-
 lib/bpf.c          | 148 ++++++++++++++++++++++++++++++++++++---------
 4 files changed, 142 insertions(+), 32 deletions(-)

Comments

Alexei Starovoitov Aug. 21, 2019, 7:26 p.m. UTC | #1
On Tue, Aug 20, 2019 at 01:47:01PM +0200, Toke Høiland-Jørgensen wrote:
> iproute2 uses its own bpf loader to load eBPF programs, which has
> evolved separately from libbpf. Since we are now standardising on
> libbpf, this becomes a problem as iproute2 is slowly accumulating
> feature incompatibilities with libbpf-based loaders. In particular,
> iproute2 has its own (expanded) version of the map definition struct,
> which makes it difficult to write programs that can be loaded with both
> custom loaders and iproute2.
> 
> This series seeks to address this by converting iproute2 to using libbpf
> for all its bpf needs. This version is an early proof-of-concept RFC, to
> get some feedback on whether people think this is the right direction.
> 
> What this series does is the following:
> 
> - Updates the libbpf map definition struct to match that of iproute2
>   (patch 1).
> - Adds functionality to libbpf to support automatic pinning of maps when
>   loading an eBPF program, while re-using pinned maps if they already
>   exist (patches 2-3).
> - Modifies iproute2 to make it possible to compile it against libbpf
>   without affecting any existing functionality (patch 4).
> - Changes the iproute2 eBPF loader to use libbpf for loading XDP
>   programs (patch 5).
> 
> 
> As this is an early PoC, there are still a few missing pieces before
> this can be merged. Including (but probably not limited to):
> 
> - Consolidate the map definition struct in the bpf_helpers.h file in the
>   kernel tree. This contains a different, and incompatible, update to
>   the struct. Since the iproute2 version has actually been released for
>   use outside the kernel tree (and thus is subject to API stability
>   constraints), I think it makes the most sense to keep that, and port
>   the selftests to use it.

It sounds like you're implying that existing libbpf format is not uapi.
It is and we cannot break it.
If patch 1 means breakage for existing pre-compiled .o that won't load
with new libbpf then we cannot use this method.
Recompiling .o with new libbpf definition of bpf_map_def isn't an option.
libbpf has to be smart before/after and recognize both old and iproute2 format.
Andrii Nakryiko Aug. 21, 2019, 8:30 p.m. UTC | #2
On Tue, Aug 20, 2019 at 4:47 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> iproute2 uses its own bpf loader to load eBPF programs, which has
> evolved separately from libbpf. Since we are now standardising on
> libbpf, this becomes a problem as iproute2 is slowly accumulating
> feature incompatibilities with libbpf-based loaders. In particular,
> iproute2 has its own (expanded) version of the map definition struct,
> which makes it difficult to write programs that can be loaded with both
> custom loaders and iproute2.
>
> This series seeks to address this by converting iproute2 to using libbpf
> for all its bpf needs. This version is an early proof-of-concept RFC, to
> get some feedback on whether people think this is the right direction.
>
> What this series does is the following:
>
> - Updates the libbpf map definition struct to match that of iproute2
>   (patch 1).


Hi Toke,

Thanks for taking a stab at unifying libbpf and iproute2 loaders. I'm
totally in support of making iproute2 use libbpf to load/initialize
BPF programs. But I'm against adding iproute2-specific fields to
libbpf's bpf_map_def definitions to support this.

I've proposed the plan of extending libbpf's supported features so
that it can be used to load iproute2-style BPF programs earlier,
please see discussions in [0] and [1]. I think instead of emulating
iproute2 way of matching everything based on user-specified internal
IDs, which doesn't provide good user experience and is quite easy to
get wrong, we should support same scenarios with better declarative
syntax and in a less error-prone way. I believe we can do that by
relying on BTF more heavily (again, please check some of my proposals
in [0], [1], and discussion with Daniel in those threads). It will
feel more natural and be more straightforward to follow. It would be
great if you can lend a hand in implementing pieces of that plan!

I'm currently on vacation, so my availability is very sparse, but I'd
be happy to discuss this further, if need be.

  [0] https://lore.kernel.org/bpf/CAEf4BzbfdG2ub7gCi0OYqBrUoChVHWsmOntWAkJt47=FE+km+A@xxxxxxxxxxxxxx/
  [1] https://www.spinics.net/lists/bpf/msg03976.html

> - Adds functionality to libbpf to support automatic pinning of maps when
>   loading an eBPF program, while re-using pinned maps if they already
>   exist (patches 2-3).
> - Modifies iproute2 to make it possible to compile it against libbpf
>   without affecting any existing functionality (patch 4).
> - Changes the iproute2 eBPF loader to use libbpf for loading XDP
>   programs (patch 5).
>
>
> As this is an early PoC, there are still a few missing pieces before
> this can be merged. Including (but probably not limited to):
>
> - Consolidate the map definition struct in the bpf_helpers.h file in the
>   kernel tree. This contains a different, and incompatible, update to
>   the struct. Since the iproute2 version has actually been released for
>   use outside the kernel tree (and thus is subject to API stability
>   constraints), I think it makes the most sense to keep that, and port
>   the selftests to use it.
>
> - The iproute2 loader supports automatically populating map-in-map
>   definitions on load. This needs to be added to libbpf as well. There
>   is an implementation of this in the selftests in the kernel tree,
>   which will have to be ported (related to the previous point).
>
> - The iproute2 port needs to be completed; this means at least
>   supporting TC eBPF programs as well, figuring out how to deal with
>   cBPF programs, and getting the verbose output back to the same state
>   as before the port. Also, I guess the iproute2 maintainers need to ACK
>   that they are good with adding a dependency on libbpf.
>
> - Some of the code added to libbpf in patch 2 in this series include
>   code derived from iproute2, which is GPLv2+. So it will need to be
>   re-licensed to be usable in libbpf. Since `git blame` indicated that
>   the original code was written by Daniel, I figure he can ACK that
>   relicensing before applying the patches :)
>
>
> Please take a look at this series and let me know if you agree
> that this is the right direction to go. Assuming there's consensus that
> it is, I'll focus on getting the rest of the libbpf patches ready for
> merging. I'll send those as a separate series, and hold off on the
> iproute2 patches until they are merged; but for this version I'm
> including both in one series so it's easier to see the context.
>
> -Toke
>
>
> libbpf:
> Toke Høiland-Jørgensen (3):
>   libbpf: Add map definition struct fields from iproute2
>   libbpf: Add support for auto-pinning of maps with reuse on program
>     load
>   libbpf: Add support for specifying map pinning path via callback
>
>  tools/lib/bpf/libbpf.c | 205 +++++++++++++++++++++++++++++++++++++++--
>  tools/lib/bpf/libbpf.h |  18 ++++
>  2 files changed, 214 insertions(+), 9 deletions(-)
>
> iproute2:
> Toke Høiland-Jørgensen (2):
>   iproute2: Allow compiling against libbpf
>   iproute2: Support loading XDP programs with libbpf
>
>  configure          |  16 +++++
>  include/bpf_util.h |   6 +-
>  ip/ipvrf.c         |   4 +-
>  lib/bpf.c          | 148 ++++++++++++++++++++++++++++++++++++---------
>  4 files changed, 142 insertions(+), 32 deletions(-)
>
>
> --
> 2.22.1
>
Toke Høiland-Jørgensen Aug. 21, 2019, 9 p.m. UTC | #3
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Tue, Aug 20, 2019 at 01:47:01PM +0200, Toke Høiland-Jørgensen wrote:
>> iproute2 uses its own bpf loader to load eBPF programs, which has
>> evolved separately from libbpf. Since we are now standardising on
>> libbpf, this becomes a problem as iproute2 is slowly accumulating
>> feature incompatibilities with libbpf-based loaders. In particular,
>> iproute2 has its own (expanded) version of the map definition struct,
>> which makes it difficult to write programs that can be loaded with both
>> custom loaders and iproute2.
>> 
>> This series seeks to address this by converting iproute2 to using libbpf
>> for all its bpf needs. This version is an early proof-of-concept RFC, to
>> get some feedback on whether people think this is the right direction.
>> 
>> What this series does is the following:
>> 
>> - Updates the libbpf map definition struct to match that of iproute2
>>   (patch 1).
>> - Adds functionality to libbpf to support automatic pinning of maps when
>>   loading an eBPF program, while re-using pinned maps if they already
>>   exist (patches 2-3).
>> - Modifies iproute2 to make it possible to compile it against libbpf
>>   without affecting any existing functionality (patch 4).
>> - Changes the iproute2 eBPF loader to use libbpf for loading XDP
>>   programs (patch 5).
>> 
>> 
>> As this is an early PoC, there are still a few missing pieces before
>> this can be merged. Including (but probably not limited to):
>> 
>> - Consolidate the map definition struct in the bpf_helpers.h file in the
>>   kernel tree. This contains a different, and incompatible, update to
>>   the struct. Since the iproute2 version has actually been released for
>>   use outside the kernel tree (and thus is subject to API stability
>>   constraints), I think it makes the most sense to keep that, and port
>>   the selftests to use it.
>
> It sounds like you're implying that existing libbpf format is not
> uapi.

No, that's not what I meant... See below.

> It is and we cannot break it.
> If patch 1 means breakage for existing pre-compiled .o that won't load
> with new libbpf then we cannot use this method.
> Recompiling .o with new libbpf definition of bpf_map_def isn't an option.
> libbpf has to be smart before/after and recognize both old and iproute2 format.

The libbpf.h definition of struct bpf_map_def is compatible with the one
used in iproute2. In libbpf.h, the struct only contains five fields
(type, key_size, value_size, max_entries and flags), and iproute2 adds
another 4 (id, pinning, inner_id and inner_idx; these are the ones in
patch 1 in this series).

The issue I was alluding to above is that the bpf_helpers.h file in the
kernel selftests directory *also* extends the bpf_map_def struct, and
adds two *different* fields (inner_map_idx and numa_mode). The former is
used to implement the same map-in-map definition functionality that
iproute2 has, but with different semantics. The latter is additional to
that, and I'm planning to add that to this series.

Since bpf_helpers.h is *not* part of libbpf (yet), this will make it
possible to keep API (and ABI) compatibility with both iproute2 and
libbpf. As in, old .o files will still load with libbpf after this
series, they just won't be able to use the new automatic pinning
feature.

-Toke
Toke Høiland-Jørgensen Aug. 21, 2019, 9:07 p.m. UTC | #4
Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Tue, Aug 20, 2019 at 4:47 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> iproute2 uses its own bpf loader to load eBPF programs, which has
>> evolved separately from libbpf. Since we are now standardising on
>> libbpf, this becomes a problem as iproute2 is slowly accumulating
>> feature incompatibilities with libbpf-based loaders. In particular,
>> iproute2 has its own (expanded) version of the map definition struct,
>> which makes it difficult to write programs that can be loaded with both
>> custom loaders and iproute2.
>>
>> This series seeks to address this by converting iproute2 to using libbpf
>> for all its bpf needs. This version is an early proof-of-concept RFC, to
>> get some feedback on whether people think this is the right direction.
>>
>> What this series does is the following:
>>
>> - Updates the libbpf map definition struct to match that of iproute2
>>   (patch 1).
>
>
> Hi Toke,
>
> Thanks for taking a stab at unifying libbpf and iproute2 loaders. I'm
> totally in support of making iproute2 use libbpf to load/initialize
> BPF programs. But I'm against adding iproute2-specific fields to
> libbpf's bpf_map_def definitions to support this.
>
> I've proposed the plan of extending libbpf's supported features so
> that it can be used to load iproute2-style BPF programs earlier,
> please see discussions in [0] and [1].

Yeah, I've seen that discussion, and agree that longer term this is
probably a better way to do map-in-map definitions.

However, I view your proposal as complementary to this series: we'll
probably also want the BTF-based definition to work with iproute2, and
that means iproute2 needs to be ported to libbpf. But iproute2 needs to
be backwards compatible with the format it supports now, and, well, this
series is the simplest way to achieve that IMO :)

> I think instead of emulating iproute2 way of matching everything based
> on user-specified internal IDs, which doesn't provide good user
> experience and is quite easy to get wrong, we should support same
> scenarios with better declarative syntax and in a less error-prone
> way. I believe we can do that by relying on BTF more heavily (again,
> please check some of my proposals in [0], [1], and discussion with
> Daniel in those threads). It will feel more natural and be more
> straightforward to follow. It would be great if you can lend a hand in
> implementing pieces of that plan!
>
> I'm currently on vacation, so my availability is very sparse, but I'd
> be happy to discuss this further, if need be.

Happy to collaborate on your proposal when you're back from vacation;
but as I said above, I believe this is a complementary longer-term
thing...

-Toke
Andrii Nakryiko Aug. 22, 2019, 7:49 a.m. UTC | #5
On Wed, Aug 21, 2019 at 2:07 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>
> > On Tue, Aug 20, 2019 at 4:47 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> iproute2 uses its own bpf loader to load eBPF programs, which has
> >> evolved separately from libbpf. Since we are now standardising on
> >> libbpf, this becomes a problem as iproute2 is slowly accumulating
> >> feature incompatibilities with libbpf-based loaders. In particular,
> >> iproute2 has its own (expanded) version of the map definition struct,
> >> which makes it difficult to write programs that can be loaded with both
> >> custom loaders and iproute2.
> >>
> >> This series seeks to address this by converting iproute2 to using libbpf
> >> for all its bpf needs. This version is an early proof-of-concept RFC, to
> >> get some feedback on whether people think this is the right direction.
> >>
> >> What this series does is the following:
> >>
> >> - Updates the libbpf map definition struct to match that of iproute2
> >>   (patch 1).
> >
> >
> > Hi Toke,
> >
> > Thanks for taking a stab at unifying libbpf and iproute2 loaders. I'm
> > totally in support of making iproute2 use libbpf to load/initialize
> > BPF programs. But I'm against adding iproute2-specific fields to
> > libbpf's bpf_map_def definitions to support this.
> >
> > I've proposed the plan of extending libbpf's supported features so
> > that it can be used to load iproute2-style BPF programs earlier,
> > please see discussions in [0] and [1].
>
> Yeah, I've seen that discussion, and agree that longer term this is
> probably a better way to do map-in-map definitions.
>
> However, I view your proposal as complementary to this series: we'll
> probably also want the BTF-based definition to work with iproute2, and
> that means iproute2 needs to be ported to libbpf. But iproute2 needs to
> be backwards compatible with the format it supports now, and, well, this
> series is the simplest way to achieve that IMO :)

Ok, I understand that. But I'd still want to avoid adding extra cruft
to libbpf just for backwards-compatibility with *exact* iproute2
format. Libbpf as a whole is trying to move away from relying on
binary bpf_map_def and into using BTF-defined map definitions, and
this patch series is a step backwards in that regard, that adds,
essentially, already outdated stuff that we'll need to support forever
(I mean those extra fields in bpf_map_def, that will stay there
forever).

We've discussed one way to deal with it, IMO, in a cleaner way. It can
be done in few steps:

1. I originally wanted BTF-defined map definitions to ignore unknown
fields. It shouldn't be a default mode, but it should be supported
(and of course is very easy to add). So let's add that and let libbpf
ignore unknown stuff.

2. Then to let iproute2 loader deal with backwards-compatibility for
libbpf-incompatible bpf_elf_map, we need to "pass-through" all those
fields so that users of libbpf (iproute2 loader, in this case) can
make use of it. The easiest and cleanest way to do this is to expose
BTF ID of a type describing each map entry and let iproute2 process
that in whichever way it sees fit.

Luckily, bpf_elf_map is compatible in `type` field, which will let
libbpf recognize bpf_elf_map as map definition. All the rest setup
will be done by iproute2, by processing BTF of bpf_elf_map, which will
let it set up map sizes, flags and do all of its map-in-map magic.

The only additions to libbpf in this case would be a new `__u32
bpf_map__btf_id(struct bpf_map* map);` API.

I haven't written any code and haven't 100% checked that this will
cover everything, but I think we should try. This will allow to let
users of libbpf do custom stuff with map definitions without having to
put all this extra logic into libbpf itself, which I think is
desirable outcome.


>
> > I think instead of emulating iproute2 way of matching everything based
> > on user-specified internal IDs, which doesn't provide good user
> > experience and is quite easy to get wrong, we should support same
> > scenarios with better declarative syntax and in a less error-prone
> > way. I believe we can do that by relying on BTF more heavily (again,
> > please check some of my proposals in [0], [1], and discussion with
> > Daniel in those threads). It will feel more natural and be more
> > straightforward to follow. It would be great if you can lend a hand in
> > implementing pieces of that plan!
> >
> > I'm currently on vacation, so my availability is very sparse, but I'd
> > be happy to discuss this further, if need be.
>
> Happy to collaborate on your proposal when you're back from vacation;
> but as I said above, I believe this is a complementary longer-term
> thing...
>
> -Toke
Andrii Nakryiko Aug. 22, 2019, 7:52 a.m. UTC | #6
On Wed, Aug 21, 2019 at 4:29 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>
> > On Tue, Aug 20, 2019 at 01:47:01PM +0200, Toke Høiland-Jørgensen wrote:
> >> iproute2 uses its own bpf loader to load eBPF programs, which has
> >> evolved separately from libbpf. Since we are now standardising on
> >> libbpf, this becomes a problem as iproute2 is slowly accumulating
> >> feature incompatibilities with libbpf-based loaders. In particular,
> >> iproute2 has its own (expanded) version of the map definition struct,
> >> which makes it difficult to write programs that can be loaded with both
> >> custom loaders and iproute2.
> >>
> >> This series seeks to address this by converting iproute2 to using libbpf
> >> for all its bpf needs. This version is an early proof-of-concept RFC, to
> >> get some feedback on whether people think this is the right direction.
> >>
> >> What this series does is the following:
> >>
> >> - Updates the libbpf map definition struct to match that of iproute2
> >>   (patch 1).
> >> - Adds functionality to libbpf to support automatic pinning of maps when
> >>   loading an eBPF program, while re-using pinned maps if they already
> >>   exist (patches 2-3).
> >> - Modifies iproute2 to make it possible to compile it against libbpf
> >>   without affecting any existing functionality (patch 4).
> >> - Changes the iproute2 eBPF loader to use libbpf for loading XDP
> >>   programs (patch 5).
> >>
> >>
> >> As this is an early PoC, there are still a few missing pieces before
> >> this can be merged. Including (but probably not limited to):
> >>
> >> - Consolidate the map definition struct in the bpf_helpers.h file in the
> >>   kernel tree. This contains a different, and incompatible, update to
> >>   the struct. Since the iproute2 version has actually been released for
> >>   use outside the kernel tree (and thus is subject to API stability
> >>   constraints), I think it makes the most sense to keep that, and port
> >>   the selftests to use it.
> >
> > It sounds like you're implying that existing libbpf format is not
> > uapi.
>
> No, that's not what I meant... See below.
>
> > It is and we cannot break it.
> > If patch 1 means breakage for existing pre-compiled .o that won't load
> > with new libbpf then we cannot use this method.
> > Recompiling .o with new libbpf definition of bpf_map_def isn't an option.
> > libbpf has to be smart before/after and recognize both old and iproute2 format.
>
> The libbpf.h definition of struct bpf_map_def is compatible with the one
> used in iproute2. In libbpf.h, the struct only contains five fields
> (type, key_size, value_size, max_entries and flags), and iproute2 adds
> another 4 (id, pinning, inner_id and inner_idx; these are the ones in
> patch 1 in this series).
>
> The issue I was alluding to above is that the bpf_helpers.h file in the
> kernel selftests directory *also* extends the bpf_map_def struct, and
> adds two *different* fields (inner_map_idx and numa_mode). The former is
> used to implement the same map-in-map definition functionality that
> iproute2 has, but with different semantics. The latter is additional to
> that, and I'm planning to add that to this series.
>
> Since bpf_helpers.h is *not* part of libbpf (yet), this will make it

We should start considering it as if it was, so if we can avoid adding
stuff that I'd need to untangle to move it into libbpf, I'd rather
avoid it.
We've already prepared this move by relicensing bpf_helpers.h. Moving
it into libbpf itself is immediate next thing I'll do when I'm back.

> possible to keep API (and ABI) compatibility with both iproute2 and
> libbpf. As in, old .o files will still load with libbpf after this
> series, they just won't be able to use the new automatic pinning
> feature.
>
> -Toke
Daniel Borkmann Aug. 22, 2019, 8:33 a.m. UTC | #7
On 8/22/19 9:49 AM, Andrii Nakryiko wrote:
> On Wed, Aug 21, 2019 at 2:07 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>>> On Tue, Aug 20, 2019 at 4:47 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>>>
>>>> iproute2 uses its own bpf loader to load eBPF programs, which has
>>>> evolved separately from libbpf. Since we are now standardising on
>>>> libbpf, this becomes a problem as iproute2 is slowly accumulating
>>>> feature incompatibilities with libbpf-based loaders. In particular,
>>>> iproute2 has its own (expanded) version of the map definition struct,
>>>> which makes it difficult to write programs that can be loaded with both
>>>> custom loaders and iproute2.
>>>>
>>>> This series seeks to address this by converting iproute2 to using libbpf
>>>> for all its bpf needs. This version is an early proof-of-concept RFC, to
>>>> get some feedback on whether people think this is the right direction.
>>>>
>>>> What this series does is the following:
>>>>
>>>> - Updates the libbpf map definition struct to match that of iproute2
>>>>    (patch 1).
>>>
>>> Thanks for taking a stab at unifying libbpf and iproute2 loaders. I'm
>>> totally in support of making iproute2 use libbpf to load/initialize
>>> BPF programs. But I'm against adding iproute2-specific fields to
>>> libbpf's bpf_map_def definitions to support this.
>>>
>>> I've proposed the plan of extending libbpf's supported features so
>>> that it can be used to load iproute2-style BPF programs earlier,
>>> please see discussions in [0] and [1].
>>
>> Yeah, I've seen that discussion, and agree that longer term this is
>> probably a better way to do map-in-map definitions.
>>
>> However, I view your proposal as complementary to this series: we'll
>> probably also want the BTF-based definition to work with iproute2, and
>> that means iproute2 needs to be ported to libbpf. But iproute2 needs to
>> be backwards compatible with the format it supports now, and, well, this
>> series is the simplest way to achieve that IMO :)
> 
> Ok, I understand that. But I'd still want to avoid adding extra cruft
> to libbpf just for backwards-compatibility with *exact* iproute2
> format. Libbpf as a whole is trying to move away from relying on
> binary bpf_map_def and into using BTF-defined map definitions, and
> this patch series is a step backwards in that regard, that adds,
> essentially, already outdated stuff that we'll need to support forever
> (I mean those extra fields in bpf_map_def, that will stay there
> forever).

Agree, adding these extensions for libbpf would be a step backwards
compared to using BTF defined map defs.

> We've discussed one way to deal with it, IMO, in a cleaner way. It can
> be done in few steps:
> 
> 1. I originally wanted BTF-defined map definitions to ignore unknown
> fields. It shouldn't be a default mode, but it should be supported
> (and of course is very easy to add). So let's add that and let libbpf
> ignore unknown stuff.
> 
> 2. Then to let iproute2 loader deal with backwards-compatibility for
> libbpf-incompatible bpf_elf_map, we need to "pass-through" all those
> fields so that users of libbpf (iproute2 loader, in this case) can
> make use of it. The easiest and cleanest way to do this is to expose
> BTF ID of a type describing each map entry and let iproute2 process
> that in whichever way it sees fit.
> 
> Luckily, bpf_elf_map is compatible in `type` field, which will let
> libbpf recognize bpf_elf_map as map definition. All the rest setup
> will be done by iproute2, by processing BTF of bpf_elf_map, which will
> let it set up map sizes, flags and do all of its map-in-map magic.
> 
> The only additions to libbpf in this case would be a new `__u32
> bpf_map__btf_id(struct bpf_map* map);` API.
> 
> I haven't written any code and haven't 100% checked that this will
> cover everything, but I think we should try. This will allow to let
> users of libbpf do custom stuff with map definitions without having to
> put all this extra logic into libbpf itself, which I think is
> desirable outcome.

Sounds reasonable in general, but all this still has the issue that we're
assuming that BTF is /always/ present. Existing object files that would load
just fine /today/ but do not have BTF attached won't be handled here. Wouldn't
it be more straight forward to allow passing callbacks to the libbpf loader
such that if the map section is not found to be bpf_map_def compatible, we
rely on external user aka callback to parse the ELF section, handle any
non-default libbpf behavior like pinning/retrieving from BPF fs, populate
related internal libbpf map data structures and pass control back to libbpf
loader afterwards. (Similar callback with prog section name handling for the
case where tail call maps get automatically populated.)

Thanks,
Daniel
Toke Høiland-Jørgensen Aug. 22, 2019, 10:38 a.m. UTC | #8
Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Wed, Aug 21, 2019 at 4:29 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>>
>> > On Tue, Aug 20, 2019 at 01:47:01PM +0200, Toke Høiland-Jørgensen wrote:
>> >> iproute2 uses its own bpf loader to load eBPF programs, which has
>> >> evolved separately from libbpf. Since we are now standardising on
>> >> libbpf, this becomes a problem as iproute2 is slowly accumulating
>> >> feature incompatibilities with libbpf-based loaders. In particular,
>> >> iproute2 has its own (expanded) version of the map definition struct,
>> >> which makes it difficult to write programs that can be loaded with both
>> >> custom loaders and iproute2.
>> >>
>> >> This series seeks to address this by converting iproute2 to using libbpf
>> >> for all its bpf needs. This version is an early proof-of-concept RFC, to
>> >> get some feedback on whether people think this is the right direction.
>> >>
>> >> What this series does is the following:
>> >>
>> >> - Updates the libbpf map definition struct to match that of iproute2
>> >>   (patch 1).
>> >> - Adds functionality to libbpf to support automatic pinning of maps when
>> >>   loading an eBPF program, while re-using pinned maps if they already
>> >>   exist (patches 2-3).
>> >> - Modifies iproute2 to make it possible to compile it against libbpf
>> >>   without affecting any existing functionality (patch 4).
>> >> - Changes the iproute2 eBPF loader to use libbpf for loading XDP
>> >>   programs (patch 5).
>> >>
>> >>
>> >> As this is an early PoC, there are still a few missing pieces before
>> >> this can be merged. Including (but probably not limited to):
>> >>
>> >> - Consolidate the map definition struct in the bpf_helpers.h file in the
>> >>   kernel tree. This contains a different, and incompatible, update to
>> >>   the struct. Since the iproute2 version has actually been released for
>> >>   use outside the kernel tree (and thus is subject to API stability
>> >>   constraints), I think it makes the most sense to keep that, and port
>> >>   the selftests to use it.
>> >
>> > It sounds like you're implying that existing libbpf format is not
>> > uapi.
>>
>> No, that's not what I meant... See below.
>>
>> > It is and we cannot break it.
>> > If patch 1 means breakage for existing pre-compiled .o that won't load
>> > with new libbpf then we cannot use this method.
>> > Recompiling .o with new libbpf definition of bpf_map_def isn't an option.
>> > libbpf has to be smart before/after and recognize both old and iproute2 format.
>>
>> The libbpf.h definition of struct bpf_map_def is compatible with the one
>> used in iproute2. In libbpf.h, the struct only contains five fields
>> (type, key_size, value_size, max_entries and flags), and iproute2 adds
>> another 4 (id, pinning, inner_id and inner_idx; these are the ones in
>> patch 1 in this series).
>>
>> The issue I was alluding to above is that the bpf_helpers.h file in the
>> kernel selftests directory *also* extends the bpf_map_def struct, and
>> adds two *different* fields (inner_map_idx and numa_mode). The former is
>> used to implement the same map-in-map definition functionality that
>> iproute2 has, but with different semantics. The latter is additional to
>> that, and I'm planning to add that to this series.
>>
>> Since bpf_helpers.h is *not* part of libbpf (yet), this will make it
>
> We should start considering it as if it was, so if we can avoid adding
> stuff that I'd need to untangle to move it into libbpf, I'd rather
> avoid it.
> We've already prepared this move by relicensing bpf_helpers.h. Moving
> it into libbpf itself is immediate next thing I'll do when I'm back.

Yeah, I figured that with the relicensing, bpf_helpers would probably be
making its way into libbpf soon. Which is why I wanted to start this
discussion before that: If we do move bpf_helpers as-is, that will put
us in the territory of full-on binary incompatibility. So the time to
discuss doing this in a compatible way is now, before any such move is
made.

-Toke
Toke Høiland-Jørgensen Aug. 22, 2019, 11:48 a.m. UTC | #9
Daniel Borkmann <daniel@iogearbox.net> writes:

> On 8/22/19 9:49 AM, Andrii Nakryiko wrote:
>> On Wed, Aug 21, 2019 at 2:07 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>>>> On Tue, Aug 20, 2019 at 4:47 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>>>>
>>>>> iproute2 uses its own bpf loader to load eBPF programs, which has
>>>>> evolved separately from libbpf. Since we are now standardising on
>>>>> libbpf, this becomes a problem as iproute2 is slowly accumulating
>>>>> feature incompatibilities with libbpf-based loaders. In particular,
>>>>> iproute2 has its own (expanded) version of the map definition struct,
>>>>> which makes it difficult to write programs that can be loaded with both
>>>>> custom loaders and iproute2.
>>>>>
>>>>> This series seeks to address this by converting iproute2 to using libbpf
>>>>> for all its bpf needs. This version is an early proof-of-concept RFC, to
>>>>> get some feedback on whether people think this is the right direction.
>>>>>
>>>>> What this series does is the following:
>>>>>
>>>>> - Updates the libbpf map definition struct to match that of iproute2
>>>>>    (patch 1).
>>>>
>>>> Thanks for taking a stab at unifying libbpf and iproute2 loaders. I'm
>>>> totally in support of making iproute2 use libbpf to load/initialize
>>>> BPF programs. But I'm against adding iproute2-specific fields to
>>>> libbpf's bpf_map_def definitions to support this.
>>>>
>>>> I've proposed the plan of extending libbpf's supported features so
>>>> that it can be used to load iproute2-style BPF programs earlier,
>>>> please see discussions in [0] and [1].
>>>
>>> Yeah, I've seen that discussion, and agree that longer term this is
>>> probably a better way to do map-in-map definitions.
>>>
>>> However, I view your proposal as complementary to this series: we'll
>>> probably also want the BTF-based definition to work with iproute2, and
>>> that means iproute2 needs to be ported to libbpf. But iproute2 needs to
>>> be backwards compatible with the format it supports now, and, well, this
>>> series is the simplest way to achieve that IMO :)
>> 
>> Ok, I understand that. But I'd still want to avoid adding extra cruft
>> to libbpf just for backwards-compatibility with *exact* iproute2
>> format. Libbpf as a whole is trying to move away from relying on
>> binary bpf_map_def and into using BTF-defined map definitions, and
>> this patch series is a step backwards in that regard, that adds,
>> essentially, already outdated stuff that we'll need to support forever
>> (I mean those extra fields in bpf_map_def, that will stay there
>> forever).
>
> Agree, adding these extensions for libbpf would be a step backwards
> compared to using BTF defined map defs.
>
>> We've discussed one way to deal with it, IMO, in a cleaner way. It can
>> be done in few steps:
>> 
>> 1. I originally wanted BTF-defined map definitions to ignore unknown
>> fields. It shouldn't be a default mode, but it should be supported
>> (and of course is very easy to add). So let's add that and let libbpf
>> ignore unknown stuff.
>> 
>> 2. Then to let iproute2 loader deal with backwards-compatibility for
>> libbpf-incompatible bpf_elf_map, we need to "pass-through" all those
>> fields so that users of libbpf (iproute2 loader, in this case) can
>> make use of it. The easiest and cleanest way to do this is to expose
>> BTF ID of a type describing each map entry and let iproute2 process
>> that in whichever way it sees fit.
>> 
>> Luckily, bpf_elf_map is compatible in `type` field, which will let
>> libbpf recognize bpf_elf_map as map definition. All the rest setup
>> will be done by iproute2, by processing BTF of bpf_elf_map, which will
>> let it set up map sizes, flags and do all of its map-in-map magic.
>> 
>> The only additions to libbpf in this case would be a new `__u32
>> bpf_map__btf_id(struct bpf_map* map);` API.
>> 
>> I haven't written any code and haven't 100% checked that this will
>> cover everything, but I think we should try. This will allow to let
>> users of libbpf do custom stuff with map definitions without having to
>> put all this extra logic into libbpf itself, which I think is
>> desirable outcome.
>
> Sounds reasonable in general, but all this still has the issue that
> we're assuming that BTF is /always/ present. Existing object files
> that would load just fine /today/ but do not have BTF attached won't
> be handled here. Wouldn't it be more straight forward to allow passing
> callbacks to the libbpf loader such that if the map section is not
> found to be bpf_map_def compatible, we rely on external user aka
> callback to parse the ELF section, handle any non-default libbpf
> behavior like pinning/retrieving from BPF fs, populate related
> internal libbpf map data structures and pass control back to libbpf
> loader afterwards. (Similar callback with prog section name handling
> for the case where tail call maps get automatically populated.)

Thinking about this some more, I think there are two separate issues
here:

1. Do we want libbpf to support the features currently in iproute2 and
   bpf_helpers (i.e., map pinning + reuse, map-in-map definitions, and
   NUMA node placement of maps). IMO the answer to this is yes.

2. What should the data format be for BPF programs to signal that they
   want to use those features? Here, the longer-term answer is BTF-based
   map definitions, but we still want iproute2 to be backwards
   compatible.


So how about I revise this patch series to implement the *features* (I
already implemented map-in-map and numa nodes[0], so that is sorta
already done), but instead of extending the bpf_map_def struct, I just
expose callbacks that will allow programs to fill in internal-to-libbpf
data structures with the required information. Then, once the BTF-based
map definition does land, that can simply define default callbacks that
uses the BTF information to fill in those same internal data structures?

This would mean no extending bpf_map_def, and relaxing the current
libbpf restriction on extending bpf_map_def.

The drawback of this approach is that it does nothing to combat
fragmentation: People building their own loaders can still reimplement
different semantics for map defs, leading to programs that are tied to a
particular loader. So this would only work if we really believe BTF can
save us from this. I don't feel competent to comment on this just yet,
but thought I'd mention it :)

-Toke
Toke Høiland-Jørgensen Aug. 22, 2019, 11:49 a.m. UTC | #10
Toke Høiland-Jørgensen <toke@redhat.com> writes:

> Daniel Borkmann <daniel@iogearbox.net> writes:
>
>> On 8/22/19 9:49 AM, Andrii Nakryiko wrote:
>>> On Wed, Aug 21, 2019 at 2:07 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>>>>> On Tue, Aug 20, 2019 at 4:47 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>>>>>
>>>>>> iproute2 uses its own bpf loader to load eBPF programs, which has
>>>>>> evolved separately from libbpf. Since we are now standardising on
>>>>>> libbpf, this becomes a problem as iproute2 is slowly accumulating
>>>>>> feature incompatibilities with libbpf-based loaders. In particular,
>>>>>> iproute2 has its own (expanded) version of the map definition struct,
>>>>>> which makes it difficult to write programs that can be loaded with both
>>>>>> custom loaders and iproute2.
>>>>>>
>>>>>> This series seeks to address this by converting iproute2 to using libbpf
>>>>>> for all its bpf needs. This version is an early proof-of-concept RFC, to
>>>>>> get some feedback on whether people think this is the right direction.
>>>>>>
>>>>>> What this series does is the following:
>>>>>>
>>>>>> - Updates the libbpf map definition struct to match that of iproute2
>>>>>>    (patch 1).
>>>>>
>>>>> Thanks for taking a stab at unifying libbpf and iproute2 loaders. I'm
>>>>> totally in support of making iproute2 use libbpf to load/initialize
>>>>> BPF programs. But I'm against adding iproute2-specific fields to
>>>>> libbpf's bpf_map_def definitions to support this.
>>>>>
>>>>> I've proposed the plan of extending libbpf's supported features so
>>>>> that it can be used to load iproute2-style BPF programs earlier,
>>>>> please see discussions in [0] and [1].
>>>>
>>>> Yeah, I've seen that discussion, and agree that longer term this is
>>>> probably a better way to do map-in-map definitions.
>>>>
>>>> However, I view your proposal as complementary to this series: we'll
>>>> probably also want the BTF-based definition to work with iproute2, and
>>>> that means iproute2 needs to be ported to libbpf. But iproute2 needs to
>>>> be backwards compatible with the format it supports now, and, well, this
>>>> series is the simplest way to achieve that IMO :)
>>> 
>>> Ok, I understand that. But I'd still want to avoid adding extra cruft
>>> to libbpf just for backwards-compatibility with *exact* iproute2
>>> format. Libbpf as a whole is trying to move away from relying on
>>> binary bpf_map_def and into using BTF-defined map definitions, and
>>> this patch series is a step backwards in that regard, that adds,
>>> essentially, already outdated stuff that we'll need to support forever
>>> (I mean those extra fields in bpf_map_def, that will stay there
>>> forever).
>>
>> Agree, adding these extensions for libbpf would be a step backwards
>> compared to using BTF defined map defs.
>>
>>> We've discussed one way to deal with it, IMO, in a cleaner way. It can
>>> be done in few steps:
>>> 
>>> 1. I originally wanted BTF-defined map definitions to ignore unknown
>>> fields. It shouldn't be a default mode, but it should be supported
>>> (and of course is very easy to add). So let's add that and let libbpf
>>> ignore unknown stuff.
>>> 
>>> 2. Then to let iproute2 loader deal with backwards-compatibility for
>>> libbpf-incompatible bpf_elf_map, we need to "pass-through" all those
>>> fields so that users of libbpf (iproute2 loader, in this case) can
>>> make use of it. The easiest and cleanest way to do this is to expose
>>> BTF ID of a type describing each map entry and let iproute2 process
>>> that in whichever way it sees fit.
>>> 
>>> Luckily, bpf_elf_map is compatible in `type` field, which will let
>>> libbpf recognize bpf_elf_map as map definition. All the rest setup
>>> will be done by iproute2, by processing BTF of bpf_elf_map, which will
>>> let it set up map sizes, flags and do all of its map-in-map magic.
>>> 
>>> The only additions to libbpf in this case would be a new `__u32
>>> bpf_map__btf_id(struct bpf_map* map);` API.
>>> 
>>> I haven't written any code and haven't 100% checked that this will
>>> cover everything, but I think we should try. This will allow to let
>>> users of libbpf do custom stuff with map definitions without having to
>>> put all this extra logic into libbpf itself, which I think is
>>> desirable outcome.
>>
>> Sounds reasonable in general, but all this still has the issue that
>> we're assuming that BTF is /always/ present. Existing object files
>> that would load just fine /today/ but do not have BTF attached won't
>> be handled here. Wouldn't it be more straight forward to allow passing
>> callbacks to the libbpf loader such that if the map section is not
>> found to be bpf_map_def compatible, we rely on external user aka
>> callback to parse the ELF section, handle any non-default libbpf
>> behavior like pinning/retrieving from BPF fs, populate related
>> internal libbpf map data structures and pass control back to libbpf
>> loader afterwards. (Similar callback with prog section name handling
>> for the case where tail call maps get automatically populated.)
>
> Thinking about this some more, I think there are two separate issues
> here:
>
> 1. Do we want libbpf to support the features currently in iproute2 and
>    bpf_helpers (i.e., map pinning + reuse, map-in-map definitions, and
>    NUMA node placement of maps). IMO the answer to this is yes.
>
> 2. What should the data format be for BPF programs to signal that they
>    want to use those features? Here, the longer-term answer is BTF-based
>    map definitions, but we still want iproute2 to be backwards
>    compatible.
>
>
> So how about I revise this patch series to implement the *features* (I
> already implemented map-in-map and numa nodes[0], so that is sorta
> already done), but instead of extending the bpf_map_def struct, I just
> expose callbacks that will allow programs to fill in internal-to-libbpf
> data structures with the required information. Then, once the BTF-based
> map definition does land, that can simply define default callbacks that
> uses the BTF information to fill in those same internal data structures?
>
> This would mean no extending bpf_map_def, and relaxing the current
> libbpf restriction on extending bpf_map_def.
>
> The drawback of this approach is that it does nothing to combat
> fragmentation: People building their own loaders can still reimplement
> different semantics for map defs, leading to programs that are tied to a
> particular loader. So this would only work if we really believe BTF can
> save us from this. I don't feel competent to comment on this just yet,
> but thought I'd mention it :)
>
> -Toke

[0] was supposed to be a reference to
    https://github.com/tohojo/libbpf/tree/iproute2-compat
Andrii Nakryiko Aug. 23, 2019, 6:31 a.m. UTC | #11
On Thu, Aug 22, 2019 at 1:33 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 8/22/19 9:49 AM, Andrii Nakryiko wrote:
> > On Wed, Aug 21, 2019 at 2:07 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> >>> On Tue, Aug 20, 2019 at 4:47 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>>>
> >>>> iproute2 uses its own bpf loader to load eBPF programs, which has
> >>>> evolved separately from libbpf. Since we are now standardising on
> >>>> libbpf, this becomes a problem as iproute2 is slowly accumulating
> >>>> feature incompatibilities with libbpf-based loaders. In particular,
> >>>> iproute2 has its own (expanded) version of the map definition struct,
> >>>> which makes it difficult to write programs that can be loaded with both
> >>>> custom loaders and iproute2.
> >>>>
> >>>> This series seeks to address this by converting iproute2 to using libbpf
> >>>> for all its bpf needs. This version is an early proof-of-concept RFC, to
> >>>> get some feedback on whether people think this is the right direction.
> >>>>
> >>>> What this series does is the following:
> >>>>
> >>>> - Updates the libbpf map definition struct to match that of iproute2
> >>>>    (patch 1).
> >>>
> >>> Thanks for taking a stab at unifying libbpf and iproute2 loaders. I'm
> >>> totally in support of making iproute2 use libbpf to load/initialize
> >>> BPF programs. But I'm against adding iproute2-specific fields to
> >>> libbpf's bpf_map_def definitions to support this.
> >>>
> >>> I've proposed the plan of extending libbpf's supported features so
> >>> that it can be used to load iproute2-style BPF programs earlier,
> >>> please see discussions in [0] and [1].
> >>
> >> Yeah, I've seen that discussion, and agree that longer term this is
> >> probably a better way to do map-in-map definitions.
> >>
> >> However, I view your proposal as complementary to this series: we'll
> >> probably also want the BTF-based definition to work with iproute2, and
> >> that means iproute2 needs to be ported to libbpf. But iproute2 needs to
> >> be backwards compatible with the format it supports now, and, well, this
> >> series is the simplest way to achieve that IMO :)
> >
> > Ok, I understand that. But I'd still want to avoid adding extra cruft
> > to libbpf just for backwards-compatibility with *exact* iproute2
> > format. Libbpf as a whole is trying to move away from relying on
> > binary bpf_map_def and into using BTF-defined map definitions, and
> > this patch series is a step backwards in that regard, that adds,
> > essentially, already outdated stuff that we'll need to support forever
> > (I mean those extra fields in bpf_map_def, that will stay there
> > forever).
>
> Agree, adding these extensions for libbpf would be a step backwards
> compared to using BTF defined map defs.
>
> > We've discussed one way to deal with it, IMO, in a cleaner way. It can
> > be done in few steps:
> >
> > 1. I originally wanted BTF-defined map definitions to ignore unknown
> > fields. It shouldn't be a default mode, but it should be supported
> > (and of course is very easy to add). So let's add that and let libbpf
> > ignore unknown stuff.
> >
> > 2. Then to let iproute2 loader deal with backwards-compatibility for
> > libbpf-incompatible bpf_elf_map, we need to "pass-through" all those
> > fields so that users of libbpf (iproute2 loader, in this case) can
> > make use of it. The easiest and cleanest way to do this is to expose
> > BTF ID of a type describing each map entry and let iproute2 process
> > that in whichever way it sees fit.
> >
> > Luckily, bpf_elf_map is compatible in `type` field, which will let
> > libbpf recognize bpf_elf_map as map definition. All the rest setup
> > will be done by iproute2, by processing BTF of bpf_elf_map, which will
> > let it set up map sizes, flags and do all of its map-in-map magic.
> >
> > The only additions to libbpf in this case would be a new `__u32
> > bpf_map__btf_id(struct bpf_map* map);` API.
> >
> > I haven't written any code and haven't 100% checked that this will
> > cover everything, but I think we should try. This will allow to let
> > users of libbpf do custom stuff with map definitions without having to
> > put all this extra logic into libbpf itself, which I think is
> > desirable outcome.
>
> Sounds reasonable in general, but all this still has the issue that we're
> assuming that BTF is /always/ present. Existing object files that would load
> just fine /today/ but do not have BTF attached won't be handled here. Wouldn't
> it be more straight forward to allow passing callbacks to the libbpf loader
> such that if the map section is not found to be bpf_map_def compatible, we
> rely on external user aka callback to parse the ELF section, handle any
> non-default libbpf behavior like pinning/retrieving from BPF fs, populate
> related internal libbpf map data structures and pass control back to libbpf

Having all those special callbacks feels a bit too narrow-focused. I
do agree that we need to provide enough flexibility to allow
non-standard BPF loaders like iproute2 to adjust and/or extend some of
BPF initialization logic, though. I'm just unsure if adding more and
more callbacks is the right approach.

Let's think slightly beyond iproute2, because iproute2 and libbpf use
the same ELF section name ("maps") for non-BTF-defined map defs, which
makes it easy to fall into the trap of designing too specific
solution. Let's take, say, BCC. BCC uses one section per each map.
And, unlike, iproute2, ELF section names don't coincide. So what if
BCC were to use libbpf as an underlying BPF loader, while preserving
its layout? This callback for "maps" section won't work at all.

BTW, I realize BCC might not be the best example here, given
on-the-fly complication nature of its programs and extensive usage of
macro, which provide quite a lot of flexibility in changing ELF
layout, if necessary, without changing user programs, but bear with
me, let's pretend we can't change some of those aspects easily. The
point I'm trying to make is that iproute2 and libbpf cases are
examples of almost compatible ELF layouts, and if we try to solve just
such case, we might end up inventing too specific solution.

What seems like a bit more flexible and generic solution is to make
libbpf API granular enough to allow users to adjust/add extra maps,
programs, relocations, whatever is necessary, before libbpf does final
loading or some other checks that would fail for normal libbpf
programs, but are ok for custom BPF programs. So instead of having
callbacks, we have API for each step, where custom loader can skip
and/or adjust behavior of each of them, while also implement their own
steps.

E.g., today's API is essentially three steps:

1. open and parse ELF: collect relos, programs, map definitions
2. load: create maps from collected defs, do program/global data/CO-RE
relocs, load and verify BPF programs
3. attach programs one by one.

Between step 1 and 2 user has flexibility to create more maps, set up
map-in-map, etc. Between 2 and 3 you can fill in global data, fill in
tail call maps, etc. That's already pretty flexible. But we can tune
and break apart those steps even further, if necessary.

E.g., for iproute2 maps, I think the biggest problem is that we can't
create a custom map dynamically, but still allow BPF instructions
relocations against that map. Plus, of course, a name clash of "maps"
ELF section with incompatible map definitions. So how about:

1. for open call, tell it to not parse "maps" section at all. We can
model that as an extra option to override map definitions ELF section
name. E.g., so that some applications can put their map defs into
"my_fancy_map_def_section" ELF section, for instance. If that section
is empty, though, libbpf will just skip step of collecting
bpf_map_defs (it can still do BTF-defined maps, btw), allowing gradual
migration.

2. provide API to add dynamically created (e.g., by iproute2 loader)
maps that are "relocatable to", before program relocations happen.
We'll need to figure out best interface here. Internally relocations
refer to maps as section index + offset and translate that to map
index. If we keep relos (internally) just as section index + offset,
it will be simple and consistent interface to add external maps that
can be relocated against, IMO. Map index is inconvenient in this case.

BTW, we'd need to answer some of those questions regardless, even for
callback-based solution. There are a bunch of details to be worked
out, of course, and I don't have exact answer to all of them right
now, but I think it's a worthwhile exercise to try to answer them and
see how API would look like.

As an aside, taking a step back and thinking about this whole API
design thing, it occurred to me that ELF is just one way to specify
programs, maps, relocations, global data, etc. But it doesn't have to
be just ELF, right? What if we have a use case where we have
"on-the-fly" creation of BPF program, with dynamically added maps,
dynamically generated BPF programs, etc. E.g., think about bpftrace
generating BPF object/program on-the-fly from their DSL language. Or
some pcap to BPF translator for firewall rules, etc, etc. It might be
inconvenient and unnecessary to generate ELF and then pass it to
libbpf to load it. Instead it could be more convenient to create an
empty bpf_object (bpf_object__new) and then use programmatic APIs to
add a bunch of maps (some sort of bpf_object__add_map) and progs
(bpf_object__add_program) with relocations against those maps, and let
libbpf do all the low-level stuff (relos, CO-RE, etc). And provide
high-level API for bpf_map, bpf_program, etc.

How API would look like to support this? Today's bpf_object__open +
bpf__object__load could be just a higher-level wrappers on top of
those APIs, constructing all the entities from standardized ELF
layout. But there still is low-level API to construct same bpf_object
construct dynamically. That seems flexible and powerful and not tied
to any particular use case, but of course requires a bit of thought
about best API.

Sorry for the wall of text :) Callback-based solutions always seem
convoluted to me, as well as hard to follow and quite often too
limited. This topic is not the easiest one to discuss over email, as
well, maybe we should chat about this while at LPC?

> loader afterwards. (Similar callback with prog section name handling for the
> case where tail call maps get automatically populated.)

I'm not sure I completely understand why we need this prog section
name callback. Can you elaborate what problem does it solve that can't
be solved with existing API?

>
> Thanks,
> Daniel
Jesper Dangaard Brouer Aug. 23, 2019, 10:27 a.m. UTC | #12
On Wed, 21 Aug 2019 13:30:09 -0700
Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:

> On Tue, Aug 20, 2019 at 4:47 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >
> > iproute2 uses its own bpf loader to load eBPF programs, which has
> > evolved separately from libbpf. Since we are now standardising on
> > libbpf, this becomes a problem as iproute2 is slowly accumulating
> > feature incompatibilities with libbpf-based loaders. In particular,
> > iproute2 has its own (expanded) version of the map definition struct,
> > which makes it difficult to write programs that can be loaded with both
> > custom loaders and iproute2.
> >
> > This series seeks to address this by converting iproute2 to using libbpf
> > for all its bpf needs. This version is an early proof-of-concept RFC, to
> > get some feedback on whether people think this is the right direction.
> >
> > What this series does is the following:
> >
> > - Updates the libbpf map definition struct to match that of iproute2
> >   (patch 1).  
> 
> 
> Hi Toke,
> 
> Thanks for taking a stab at unifying libbpf and iproute2 loaders. I'm
> totally in support of making iproute2 use libbpf to load/initialize
> BPF programs. But I'm against adding iproute2-specific fields to
> libbpf's bpf_map_def definitions to support this.
> 
> I've proposed the plan of extending libbpf's supported features so
> that it can be used to load iproute2-style BPF programs earlier,
> please see discussions in [0] and [1]. I think instead of emulating
> iproute2 way of matching everything based on user-specified internal
> IDs, which doesn't provide good user experience and is quite easy to
> get wrong, we should support same scenarios with better declarative
> syntax and in a less error-prone way. I believe we can do that by
> relying on BTF more heavily (again, please check some of my proposals
> in [0], [1], and discussion with Daniel in those threads). It will
> feel more natural and be more straightforward to follow. It would be
> great if you can lend a hand in implementing pieces of that plan!
> 
> I'm currently on vacation, so my availability is very sparse, but I'd
> be happy to discuss this further, if need be.
> 
>   [0] https://lore.kernel.org/bpf/CAEf4BzbfdG2ub7gCi0OYqBrUoChVHWsmOntWAkJt47=FE+km+A@mail.gmail.com/
>   [1] https://www.spinics.net/lists/bpf/msg03976.html
> 
> > - Adds functionality to libbpf to support automatic pinning of maps when
> >   loading an eBPF program, while re-using pinned maps if they already
> >   exist (patches 2-3).

For production use-cases, libbpf really need an easier higher-level API
for re-using pinned maps, for establishing shared maps between
programs.  The existing libbpf API bpf_object__pin_maps() and
bpf_object__unpin_maps(), which don't re-use pinned maps, are not
really usable, because they pin/unpin ALL maps in the ELF file.

What users really need is an easy way to specify, on a per map basis,
what kind of pinning and reuse/sharing they want.  E.g. like iproute2
have, "global", "object-scope", and "no-pinning". ("ifindex-scope" would
be nice for XDP).
  Today users have to split/reimplement bpf_prog_load_xattr(), and
use/add bpf_map__reuse_fd().  Which is that I ended doing for
xdp-cpumap-tc[2] (used in production at ISP) resulting in 142 lines of
extra code[3] that should have been hidden inside libbpf.  And worse,
in this solution[4] the maps for reuse-pinning is specified in the code
by name.  Thus, they cannot use a generic loader.  That I why, I want
to mark the maps via a pinning member, like iproute2.

I really hope this moves in a practical direction, as I have the next
production request lined up (also from an ISP), and I hate to have to
advice them to choose the same route as [3].


[2] https://github.com/xdp-project/xdp-cpumap-tc/
[3] https://github.com/xdp-project/xdp-cpumap-tc/blob/master/src/xdp_iphash_to_cpu_user.c#L262-L403
[4] https://github.com/xdp-project/xdp-cpumap-tc/blob/master/src/xdp_iphash_to_cpu_user.c#L431-L441
Toke Høiland-Jørgensen Aug. 23, 2019, 11:29 a.m. UTC | #13
[ ... snip ...]

> E.g., today's API is essentially three steps:
>
> 1. open and parse ELF: collect relos, programs, map definitions
> 2. load: create maps from collected defs, do program/global data/CO-RE
> relocs, load and verify BPF programs
> 3. attach programs one by one.
>
> Between step 1 and 2 user has flexibility to create more maps, set up
> map-in-map, etc. Between 2 and 3 you can fill in global data, fill in
> tail call maps, etc. That's already pretty flexible. But we can tune
> and break apart those steps even further, if necessary.

Today, steps 1 and 2 can be collapsed into a single call to
bpf_prog_load_xattr(). As Jesper's mail explains, for XDP we don't
generally want to do all the fancy rewriting stuff, we just want a
simple way to load a program and get reusable pinning of maps.
Preferably in a way that is compatible with the iproute2 loader.

So I really think we need two things:

(1) a flexible API that splits up all the various steps in a way that
    allows programs to inject their own map definitions before
    relocations and loading

(2) a simple convenience wrapper that loads an object file, does
    something sensible with pinning and map-in-map definitions, and loads
    everything into the kernel.

I'd go so far as to say that (2) should even support system-wide
configuration, similar to the /etc/iproute2/bpf_pinning file. E.g., an
/etc/libbpf/pinning.conf file that sets the default pinning directory,
and makes it possible to set up pin-value-to-subdir mappings like what
iproute2 does today.

Having (2) makes it more likely that all the different custom loaders
will be compatible with each other, while still allowing people to do
their own custom thing with (1). And of course, (2) could be implemented
in terms of (1) internally in libbpf.

In my ideal world, (2) would just use the definition format already in
iproute2 (this is basically what I implemented already), but if you guys
don't want to put this into libbpf, I can probably live with the default
format being BTF-based instead. Which would mean that iproute2 I would
end up with a flow like this:

- When given an elf file, try to run it through the "standard loader"
  (2). If this works, great, proceed to program attach.

- If using (2) fails because it doesn't understand the map definition,
  fall back to a compatibility loader that parses the legacy iproute2
  map definition format and uses (1) to load that.


Does the above make sense? :)

-Toke
Andrii Nakryiko Aug. 28, 2019, 8:23 p.m. UTC | #14
On Fri, Aug 23, 2019 at 3:27 AM Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
>
> On Wed, 21 Aug 2019 13:30:09 -0700
> Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>
> > On Tue, Aug 20, 2019 at 4:47 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> > >
> > > iproute2 uses its own bpf loader to load eBPF programs, which has
> > > evolved separately from libbpf. Since we are now standardising on
> > > libbpf, this becomes a problem as iproute2 is slowly accumulating
> > > feature incompatibilities with libbpf-based loaders. In particular,
> > > iproute2 has its own (expanded) version of the map definition struct,
> > > which makes it difficult to write programs that can be loaded with both
> > > custom loaders and iproute2.
> > >
> > > This series seeks to address this by converting iproute2 to using libbpf
> > > for all its bpf needs. This version is an early proof-of-concept RFC, to
> > > get some feedback on whether people think this is the right direction.
> > >
> > > What this series does is the following:
> > >
> > > - Updates the libbpf map definition struct to match that of iproute2
> > >   (patch 1).
> >
> >
> > Hi Toke,
> >
> > Thanks for taking a stab at unifying libbpf and iproute2 loaders. I'm
> > totally in support of making iproute2 use libbpf to load/initialize
> > BPF programs. But I'm against adding iproute2-specific fields to
> > libbpf's bpf_map_def definitions to support this.
> >
> > I've proposed the plan of extending libbpf's supported features so
> > that it can be used to load iproute2-style BPF programs earlier,
> > please see discussions in [0] and [1]. I think instead of emulating
> > iproute2 way of matching everything based on user-specified internal
> > IDs, which doesn't provide good user experience and is quite easy to
> > get wrong, we should support same scenarios with better declarative
> > syntax and in a less error-prone way. I believe we can do that by
> > relying on BTF more heavily (again, please check some of my proposals
> > in [0], [1], and discussion with Daniel in those threads). It will
> > feel more natural and be more straightforward to follow. It would be
> > great if you can lend a hand in implementing pieces of that plan!
> >
> > I'm currently on vacation, so my availability is very sparse, but I'd
> > be happy to discuss this further, if need be.
> >
> >   [0] https://lore.kernel.org/bpf/CAEf4BzbfdG2ub7gCi0OYqBrUoChVHWsmOntWAkJt47=FE+km+A@mail.gmail.com/
> >   [1] https://www.spinics.net/lists/bpf/msg03976.html
> >
> > > - Adds functionality to libbpf to support automatic pinning of maps when
> > >   loading an eBPF program, while re-using pinned maps if they already
> > >   exist (patches 2-3).
>
> For production use-cases, libbpf really need an easier higher-level API
> for re-using pinned maps, for establishing shared maps between
> programs.  The existing libbpf API bpf_object__pin_maps() and
> bpf_object__unpin_maps(), which don't re-use pinned maps, are not
> really usable, because they pin/unpin ALL maps in the ELF file.
>
> What users really need is an easy way to specify, on a per map basis,
> what kind of pinning and reuse/sharing they want.  E.g. like iproute2
> have, "global", "object-scope", and "no-pinning". ("ifindex-scope" would
> be nice for XDP).

I totally agree and I think this is easy to add both for BTF-defined
and "classic" bpf_map_def maps. Daniel mentioned in one of the
previous threads that in practice object-scope doesn't seem to be
used, so I'd say we should start with no-pinning + global pinning as
two initial supported values for pinning attribute. ifindex-scope is
interesting, but I'd love to hear a bit more about the use cases.

>   Today users have to split/reimplement bpf_prog_load_xattr(), and
> use/add bpf_map__reuse_fd().  Which is that I ended doing for

Honestly, bpf_prog_load_xattr() existence seems redundant to me. It's
basically just bpf_object__open + bpf_object__load. There is a piece
in the middle with "guessing" program types, but it should just be
moved into bpf_object__open and happen by default. Using open + load
gives more control and isn't really harder than bpf_prog_load_xattr.
bpf_prog_load_xattr which might be slightly more convenient for simple
use case, but falls apart immediately if you need to tune anything
before load.

> xdp-cpumap-tc[2] (used in production at ISP) resulting in 142 lines of
> extra code[3] that should have been hidden inside libbpf.  And worse,
> in this solution[4] the maps for reuse-pinning is specified in the code
> by name.  Thus, they cannot use a generic loader.  That I why, I want
> to mark the maps via a pinning member, like iproute2.
>
> I really hope this moves in a practical direction, as I have the next
> production request lined up (also from an ISP), and I hate to have to
> advice them to choose the same route as [3].

It seems to me that map pinning doesn't need much discussion at this
point, let's start with no-pinning + global pinning. To accommodate
pinning at custom root, bpf_object__open_xattr should accept extra
argument with non-default pinning root path. That should solve your
case completely, shouldn't it? Ultimately, with BTF-defined maps it
should be possible to specify custom pinning path on per-map basis for
cases where user needs ultimate non-uniform manual control.

>
>
> [2] https://github.com/xdp-project/xdp-cpumap-tc/
> [3] https://github.com/xdp-project/xdp-cpumap-tc/blob/master/src/xdp_iphash_to_cpu_user.c#L262-L403
> [4] https://github.com/xdp-project/xdp-cpumap-tc/blob/master/src/xdp_iphash_to_cpu_user.c#L431-L441
> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer
Andrii Nakryiko Aug. 28, 2019, 8:40 p.m. UTC | #15
On Fri, Aug 23, 2019 at 4:29 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> [ ... snip ...]
>
> > E.g., today's API is essentially three steps:
> >
> > 1. open and parse ELF: collect relos, programs, map definitions
> > 2. load: create maps from collected defs, do program/global data/CO-RE
> > relocs, load and verify BPF programs
> > 3. attach programs one by one.
> >
> > Between step 1 and 2 user has flexibility to create more maps, set up
> > map-in-map, etc. Between 2 and 3 you can fill in global data, fill in
> > tail call maps, etc. That's already pretty flexible. But we can tune
> > and break apart those steps even further, if necessary.
>
> Today, steps 1 and 2 can be collapsed into a single call to
> bpf_prog_load_xattr(). As Jesper's mail explains, for XDP we don't
> generally want to do all the fancy rewriting stuff, we just want a
> simple way to load a program and get reusable pinning of maps.

I agree. See my response to Jesper's message. Note also my view of
bpf_prog_load_xattr() existence.

> Preferably in a way that is compatible with the iproute2 loader.
>
> So I really think we need two things:
>
> (1) a flexible API that splits up all the various steps in a way that
>     allows programs to inject their own map definitions before
>     relocations and loading
>
> (2) a simple convenience wrapper that loads an object file, does
>     something sensible with pinning and map-in-map definitions, and loads
>     everything into the kernel.

I agree. I think this wrapper is bpf_object__open + bpf_object__load
(bpf_prog_load_xattr will do as well, if you don't need to do anything
between open and load). I think pinning is simple to add in minimal
form and is pretty non-controversial (there is some ambiguity as to
how to handle merging of prog array maps, or maybe not just prog array
maps, but that can be controlled later through extra flags/attributes,
so I'd start with something sensible as a default behavior).

>
> I'd go so far as to say that (2) should even support system-wide
> configuration, similar to the /etc/iproute2/bpf_pinning file. E.g., an
> /etc/libbpf/pinning.conf file that sets the default pinning directory,
> and makes it possible to set up pin-value-to-subdir mappings like what
> iproute2 does today.

This I'm a bit hesitant about. It feels like it's not library's job to
read some system-wide configs modifying its behavior. We have all
those _xattr methods, which allow to override sensible defaults, I'd
try to go as far as possible with just that before doing
libbpf-specific /etc configs.

>
> Having (2) makes it more likely that all the different custom loaders
> will be compatible with each other, while still allowing people to do
> their own custom thing with (1). And of course, (2) could be implemented
> in terms of (1) internally in libbpf.
>
> In my ideal world, (2) would just use the definition format already in
> iproute2 (this is basically what I implemented already), but if you guys
> don't want to put this into libbpf, I can probably live with the default

I want to avoid having legacy-at-the-time-it-was-added code in libbpf
that we'd need to support for a long time, that solves only iproute2
cases, which is why I'm pushing back. With BTF we can support same
functionality in better form, which is what I want to prioritize and
which will be beneficial to the whole BPF ecosystem.

But I also want to make libbpf useful to iproute2 and other custom
loaders that have to support existing formats, and thus my proposal to
have libbpf provide granular enough APIs to augment default format in
non-intrusive way. Should this be callback-based or not is secondary,
though important to API design, concern.

> format being BTF-based instead. Which would mean that iproute2 I would
> end up with a flow like this:
>
> - When given an elf file, try to run it through the "standard loader"
>   (2). If this works, great, proceed to program attach.
>
> - If using (2) fails because it doesn't understand the map definition,
>   fall back to a compatibility loader that parses the legacy iproute2
>   map definition format and uses (1) to load that.
>
>
> Does the above make sense? :)

It does, yes. Also, with BTF enabled it should be easy to distinguish
between those two (e.g., was bpf_elf_map type used? if yes, then it's
a compatibility format) and not do extra work.

>
> -Toke