mbox series

[RFC,bpf-next,0/5] Convert iproute2 to use libbpf (WIP)

Message ID 20190820114706.18546-1-toke@redhat.com
Headers show
Series Convert iproute2 to use libbpf (WIP) | expand

Message

Toke Høiland-Jørgensen Aug. 20, 2019, 11:47 a.m. UTC
iproute2 uses its own bpf loader to load eBPF programs, which has
evolved separately from libbpf. Since we are now standardising on
libbpf, this becomes a problem as iproute2 is slowly accumulating
feature incompatibilities with libbpf-based loaders. In particular,
iproute2 has its own (expanded) version of the map definition struct,
which makes it difficult to write programs that can be loaded with both
custom loaders and iproute2.

This series seeks to address this by converting iproute2 to using libbpf
for all its bpf needs. This version is an early proof-of-concept RFC, to
get some feedback on whether people think this is the right direction.

What this series does is the following:

- Updates the libbpf map definition struct to match that of iproute2
  (patch 1).
- Adds functionality to libbpf to support automatic pinning of maps when
  loading an eBPF program, while re-using pinned maps if they already
  exist (patches 2-3).
- Modifies iproute2 to make it possible to compile it against libbpf
  without affecting any existing functionality (patch 4).
- Changes the iproute2 eBPF loader to use libbpf for loading XDP
  programs (patch 5).


As this is an early PoC, there are still a few missing pieces before
this can be merged. Including (but probably not limited to):

- Consolidate the map definition struct in the bpf_helpers.h file in the
  kernel tree. This contains a different, and incompatible, update to
  the struct. Since the iproute2 version has actually been released for
  use outside the kernel tree (and thus is subject to API stability
  constraints), I think it makes the most sense to keep that, and port
  the selftests to use it.

- The iproute2 loader supports automatically populating map-in-map
  definitions on load. This needs to be added to libbpf as well. There
  is an implementation of this in the selftests in the kernel tree,
  which will have to be ported (related to the previous point).

- The iproute2 port needs to be completed; this means at least
  supporting TC eBPF programs as well, figuring out how to deal with
  cBPF programs, and getting the verbose output back to the same state
  as before the port. Also, I guess the iproute2 maintainers need to ACK
  that they are good with adding a dependency on libbpf.

- Some of the code added to libbpf in patch 2 in this series include
  code derived from iproute2, which is GPLv2+. So it will need to be
  re-licensed to be usable in libbpf. Since `git blame` indicated that
  the original code was written by Daniel, I figure he can ACK that
  relicensing before applying the patches :)


Please take a look at this series and let me know if you agree
that this is the right direction to go. Assuming there's consensus that
it is, I'll focus on getting the rest of the libbpf patches ready for
merging. I'll send those as a separate series, and hold off on the
iproute2 patches until they are merged; but for this version I'm
including both in one series so it's easier to see the context.

-Toke


libbpf:
Toke Høiland-Jørgensen (3):
  libbpf: Add map definition struct fields from iproute2
  libbpf: Add support for auto-pinning of maps with reuse on program
    load
  libbpf: Add support for specifying map pinning path via callback

 tools/lib/bpf/libbpf.c | 205 +++++++++++++++++++++++++++++++++++++++--
 tools/lib/bpf/libbpf.h |  18 ++++
 2 files changed, 214 insertions(+), 9 deletions(-)

iproute2:
Toke Høiland-Jørgensen (2):
  iproute2: Allow compiling against libbpf
  iproute2: Support loading XDP programs with libbpf

 configure          |  16 +++++
 include/bpf_util.h |   6 +-
 ip/ipvrf.c         |   4 +-
 lib/bpf.c          | 148 ++++++++++++++++++++++++++++++++++++---------
 4 files changed, 142 insertions(+), 32 deletions(-)

Comments

Alexei Starovoitov Aug. 21, 2019, 7:26 p.m. UTC | #1
On Tue, Aug 20, 2019 at 01:47:01PM +0200, Toke Høiland-Jørgensen wrote:
> iproute2 uses its own bpf loader to load eBPF programs, which has
> evolved separately from libbpf. Since we are now standardising on
> libbpf, this becomes a problem as iproute2 is slowly accumulating
> feature incompatibilities with libbpf-based loaders. In particular,
> iproute2 has its own (expanded) version of the map definition struct,
> which makes it difficult to write programs that can be loaded with both
> custom loaders and iproute2.
> 
> This series seeks to address this by converting iproute2 to using libbpf
> for all its bpf needs. This version is an early proof-of-concept RFC, to
> get some feedback on whether people think this is the right direction.
> 
> What this series does is the following:
> 
> - Updates the libbpf map definition struct to match that of iproute2
>   (patch 1).
> - Adds functionality to libbpf to support automatic pinning of maps when
>   loading an eBPF program, while re-using pinned maps if they already
>   exist (patches 2-3).
> - Modifies iproute2 to make it possible to compile it against libbpf
>   without affecting any existing functionality (patch 4).
> - Changes the iproute2 eBPF loader to use libbpf for loading XDP
>   programs (patch 5).
> 
> 
> As this is an early PoC, there are still a few missing pieces before
> this can be merged. Including (but probably not limited to):
> 
> - Consolidate the map definition struct in the bpf_helpers.h file in the
>   kernel tree. This contains a different, and incompatible, update to
>   the struct. Since the iproute2 version has actually been released for
>   use outside the kernel tree (and thus is subject to API stability
>   constraints), I think it makes the most sense to keep that, and port
>   the selftests to use it.

It sounds like you're implying that existing libbpf format is not uapi.
It is and we cannot break it.
If patch 1 means breakage for existing pre-compiled .o that won't load
with new libbpf then we cannot use this method.
Recompiling .o with new libbpf definition of bpf_map_def isn't an option.
libbpf has to be smart before/after and recognize both old and iproute2 format.
Andrii Nakryiko Aug. 21, 2019, 8:30 p.m. UTC | #2
On Tue, Aug 20, 2019 at 4:47 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> iproute2 uses its own bpf loader to load eBPF programs, which has
> evolved separately from libbpf. Since we are now standardising on
> libbpf, this becomes a problem as iproute2 is slowly accumulating
> feature incompatibilities with libbpf-based loaders. In particular,
> iproute2 has its own (expanded) version of the map definition struct,
> which makes it difficult to write programs that can be loaded with both
> custom loaders and iproute2.
>
> This series seeks to address this by converting iproute2 to using libbpf
> for all its bpf needs. This version is an early proof-of-concept RFC, to
> get some feedback on whether people think this is the right direction.
>
> What this series does is the following:
>
> - Updates the libbpf map definition struct to match that of iproute2
>   (patch 1).


Hi Toke,

Thanks for taking a stab at unifying libbpf and iproute2 loaders. I'm
totally in support of making iproute2 use libbpf to load/initialize
BPF programs. But I'm against adding iproute2-specific fields to
libbpf's bpf_map_def definitions to support this.

I've proposed the plan of extending libbpf's supported features so
that it can be used to load iproute2-style BPF programs earlier,
please see discussions in [0] and [1]. I think instead of emulating
iproute2 way of matching everything based on user-specified internal
IDs, which doesn't provide good user experience and is quite easy to
get wrong, we should support same scenarios with better declarative
syntax and in a less error-prone way. I believe we can do that by
relying on BTF more heavily (again, please check some of my proposals
in [0], [1], and discussion with Daniel in those threads). It will
feel more natural and be more straightforward to follow. It would be
great if you can lend a hand in implementing pieces of that plan!

I'm currently on vacation, so my availability is very sparse, but I'd
be happy to discuss this further, if need be.

  [0] https://lore.kernel.org/bpf/CAEf4BzbfdG2ub7gCi0OYqBrUoChVHWsmOntWAkJt47=FE+km+A@xxxxxxxxxxxxxx/
  [1] https://www.spinics.net/lists/bpf/msg03976.html

> - Adds functionality to libbpf to support automatic pinning of maps when
>   loading an eBPF program, while re-using pinned maps if they already
>   exist (patches 2-3).
> - Modifies iproute2 to make it possible to compile it against libbpf
>   without affecting any existing functionality (patch 4).
> - Changes the iproute2 eBPF loader to use libbpf for loading XDP
>   programs (patch 5).
>
>
> As this is an early PoC, there are still a few missing pieces before
> this can be merged. Including (but probably not limited to):
>
> - Consolidate the map definition struct in the bpf_helpers.h file in the
>   kernel tree. This contains a different, and incompatible, update to
>   the struct. Since the iproute2 version has actually been released for
>   use outside the kernel tree (and thus is subject to API stability
>   constraints), I think it makes the most sense to keep that, and port
>   the selftests to use it.
>
> - The iproute2 loader supports automatically populating map-in-map
>   definitions on load. This needs to be added to libbpf as well. There
>   is an implementation of this in the selftests in the kernel tree,
>   which will have to be ported (related to the previous point).
>
> - The iproute2 port needs to be completed; this means at least
>   supporting TC eBPF programs as well, figuring out how to deal with
>   cBPF programs, and getting the verbose output back to the same state
>   as before the port. Also, I guess the iproute2 maintainers need to ACK
>   that they are good with adding a dependency on libbpf.
>
> - Some of the code added to libbpf in patch 2 in this series include
>   code derived from iproute2, which is GPLv2+. So it will need to be
>   re-licensed to be usable in libbpf. Since `git blame` indicated that
>   the original code was written by Daniel, I figure he can ACK that
>   relicensing before applying the patches :)
>
>
> Please take a look at this series and let me know if you agree
> that this is the right direction to go. Assuming there's consensus that
> it is, I'll focus on getting the rest of the libbpf patches ready for
> merging. I'll send those as a separate series, and hold off on the
> iproute2 patches until they are merged; but for this version I'm
> including both in one series so it's easier to see the context.
>
> -Toke
>
>
> libbpf:
> Toke Høiland-Jørgensen (3):
>   libbpf: Add map definition struct fields from iproute2
>   libbpf: Add support for auto-pinning of maps with reuse on program
>     load
>   libbpf: Add support for specifying map pinning path via callback
>
>  tools/lib/bpf/libbpf.c | 205 +++++++++++++++++++++++++++++++++++++++--
>  tools/lib/bpf/libbpf.h |  18 ++++
>  2 files changed, 214 insertions(+), 9 deletions(-)
>
> iproute2:
> Toke Høiland-Jørgensen (2):
>   iproute2: Allow compiling against libbpf
>   iproute2: Support loading XDP programs with libbpf
>
>  configure          |  16 +++++
>  include/bpf_util.h |   6 +-
>  ip/ipvrf.c         |   4 +-
>  lib/bpf.c          | 148 ++++++++++++++++++++++++++++++++++++---------
>  4 files changed, 142 insertions(+), 32 deletions(-)
>
>
> --
> 2.22.1
>
Toke Høiland-Jørgensen Aug. 21, 2019, 9 p.m. UTC | #3
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Tue, Aug 20, 2019 at 01:47:01PM +0200, Toke Høiland-Jørgensen wrote:
>> iproute2 uses its own bpf loader to load eBPF programs, which has
>> evolved separately from libbpf. Since we are now standardising on
>> libbpf, this becomes a problem as iproute2 is slowly accumulating
>> feature incompatibilities with libbpf-based loaders. In particular,
>> iproute2 has its own (expanded) version of the map definition struct,
>> which makes it difficult to write programs that can be loaded with both
>> custom loaders and iproute2.
>> 
>> This series seeks to address this by converting iproute2 to using libbpf
>> for all its bpf needs. This version is an early proof-of-concept RFC, to
>> get some feedback on whether people think this is the right direction.
>> 
>> What this series does is the following:
>> 
>> - Updates the libbpf map definition struct to match that of iproute2
>>   (patch 1).
>> - Adds functionality to libbpf to support automatic pinning of maps when
>>   loading an eBPF program, while re-using pinned maps if they already
>>   exist (patches 2-3).
>> - Modifies iproute2 to make it possible to compile it against libbpf
>>   without affecting any existing functionality (patch 4).
>> - Changes the iproute2 eBPF loader to use libbpf for loading XDP
>>   programs (patch 5).
>> 
>> 
>> As this is an early PoC, there are still a few missing pieces before
>> this can be merged. Including (but probably not limited to):
>> 
>> - Consolidate the map definition struct in the bpf_helpers.h file in the
>>   kernel tree. This contains a different, and incompatible, update to
>>   the struct. Since the iproute2 version has actually been released for
>>   use outside the kernel tree (and thus is subject to API stability
>>   constraints), I think it makes the most sense to keep that, and port
>>   the selftests to use it.
>
> It sounds like you're implying that existing libbpf format is not
> uapi.

No, that's not what I meant... See below.

> It is and we cannot break it.
> If patch 1 means breakage for existing pre-compiled .o that won't load
> with new libbpf then we cannot use this method.
> Recompiling .o with new libbpf definition of bpf_map_def isn't an option.
> libbpf has to be smart before/after and recognize both old and iproute2 format.

The libbpf.h definition of struct bpf_map_def is compatible with the one
used in iproute2. In libbpf.h, the struct only contains five fields
(type, key_size, value_size, max_entries and flags), and iproute2 adds
another 4 (id, pinning, inner_id and inner_idx; these are the ones in
patch 1 in this series).

The issue I was alluding to above is that the bpf_helpers.h file in the
kernel selftests directory *also* extends the bpf_map_def struct, and
adds two *different* fields (inner_map_idx and numa_mode). The former is
used to implement the same map-in-map definition functionality that
iproute2 has, but with different semantics. The latter is additional to
that, and I'm planning to add that to this series.

Since bpf_helpers.h is *not* part of libbpf (yet), this will make it
possible to keep API (and ABI) compatibility with both iproute2 and
libbpf. As in, old .o files will still load with libbpf after this
series, they just won't be able to use the new automatic pinning
feature.

-Toke
Toke Høiland-Jørgensen Aug. 21, 2019, 9:07 p.m. UTC | #4
Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Tue, Aug 20, 2019 at 4:47 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> iproute2 uses its own bpf loader to load eBPF programs, which has
>> evolved separately from libbpf. Since we are now standardising on
>> libbpf, this becomes a problem as iproute2 is slowly accumulating
>> feature incompatibilities with libbpf-based loaders. In particular,
>> iproute2 has its own (expanded) version of the map definition struct,
>> which makes it difficult to write programs that can be loaded with both
>> custom loaders and iproute2.
>>
>> This series seeks to address this by converting iproute2 to using libbpf
>> for all its bpf needs. This version is an early proof-of-concept RFC, to
>> get some feedback on whether people think this is the right direction.
>>
>> What this series does is the following:
>>
>> - Updates the libbpf map definition struct to match that of iproute2
>>   (patch 1).
>
>
> Hi Toke,
>
> Thanks for taking a stab at unifying libbpf and iproute2 loaders. I'm
> totally in support of making iproute2 use libbpf to load/initialize
> BPF programs. But I'm against adding iproute2-specific fields to
> libbpf's bpf_map_def definitions to support this.
>
> I've proposed the plan of extending libbpf's supported features so
> that it can be used to load iproute2-style BPF programs earlier,
> please see discussions in [0] and [1].

Yeah, I've seen that discussion, and agree that longer term this is
probably a better way to do map-in-map definitions.

However, I view your proposal as complementary to this series: we'll
probably also want the BTF-based definition to work with iproute2, and
that means iproute2 needs to be ported to libbpf. But iproute2 needs to
be backwards compatible with the format it supports now, and, well, this
series is the simplest way to achieve that IMO :)

> I think instead of emulating iproute2 way of matching everything based
> on user-specified internal IDs, which doesn't provide good user
> experience and is quite easy to get wrong, we should support same
> scenarios with better declarative syntax and in a less error-prone
> way. I believe we can do that by relying on BTF more heavily (again,
> please check some of my proposals in [0], [1], and discussion with
> Daniel in those threads). It will feel more natural and be more
> straightforward to follow. It would be great if you can lend a hand in
> implementing pieces of that plan!
>
> I'm currently on vacation, so my availability is very sparse, but I'd
> be happy to discuss this further, if need be.

Happy to collaborate on your proposal when you're back from vacation;
but as I said above, I believe this is a complementary longer-term
thing...

-Toke
Andrii Nakryiko Aug. 22, 2019, 7:49 a.m. UTC | #5
On Wed, Aug 21, 2019 at 2:07 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>
> > On Tue, Aug 20, 2019 at 4:47 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> iproute2 uses its own bpf loader to load eBPF programs, which has
> >> evolved separately from libbpf. Since we are now standardising on
> >> libbpf, this becomes a problem as iproute2 is slowly accumulating
> >> feature incompatibilities with libbpf-based loaders. In particular,
> >> iproute2 has its own (expanded) version of the map definition struct,
> >> which makes it difficult to write programs that can be loaded with both
> >> custom loaders and iproute2.
> >>
> >> This series seeks to address this by converting iproute2 to using libbpf
> >> for all its bpf needs. This version is an early proof-of-concept RFC, to
> >> get some feedback on whether people think this is the right direction.
> >>
> >> What this series does is the following:
> >>
> >> - Updates the libbpf map definition struct to match that of iproute2
> >>   (patch 1).
> >
> >
> > Hi Toke,
> >
> > Thanks for taking a stab at unifying libbpf and iproute2 loaders. I'm
> > totally in support of making iproute2 use libbpf to load/initialize
> > BPF programs. But I'm against adding iproute2-specific fields to
> > libbpf's bpf_map_def definitions to support this.
> >
> > I've proposed the plan of extending libbpf's supported features so
> > that it can be used to load iproute2-style BPF programs earlier,
> > please see discussions in [0] and [1].
>
> Yeah, I've seen that discussion, and agree that longer term this is
> probably a better way to do map-in-map definitions.
>
> However, I view your proposal as complementary to this series: we'll
> probably also want the BTF-based definition to work with iproute2, and
> that means iproute2 needs to be ported to libbpf. But iproute2 needs to
> be backwards compatible with the format it supports now, and, well, this
> series is the simplest way to achieve that IMO :)

Ok, I understand that. But I'd still want to avoid adding extra cruft
to libbpf just for backwards-compatibility with *exact* iproute2
format. Libbpf as a whole is trying to move away from relying on
binary bpf_map_def and into using BTF-defined map definitions, and
this patch series is a step backwards in that regard, that adds,
essentially, already outdated stuff that we'll need to support forever
(I mean those extra fields in bpf_map_def, that will stay there
forever).

We've discussed one way to deal with it, IMO, in a cleaner way. It can
be done in few steps:

1. I originally wanted BTF-defined map definitions to ignore unknown
fields. It shouldn't be a default mode, but it should be supported
(and of course is very easy to add). So let's add that and let libbpf
ignore unknown stuff.

2. Then to let iproute2 loader deal with backwards-compatibility for
libbpf-incompatible bpf_elf_map, we need to "pass-through" all those
fields so that users of libbpf (iproute2 loader, in this case) can
make use of it. The easiest and cleanest way to do this is to expose
BTF ID of a type describing each map entry and let iproute2 process
that in whichever way it sees fit.

Luckily, bpf_elf_map is compatible in `type` field, which will let
libbpf recognize bpf_elf_map as map definition. All the rest setup
will be done by iproute2, by processing BTF of bpf_elf_map, which will
let it set up map sizes, flags and do all of its map-in-map magic.

The only additions to libbpf in this case would be a new `__u32
bpf_map__btf_id(struct bpf_map* map);` API.

I haven't written any code and haven't 100% checked that this will
cover everything, but I think we should try. This will allow to let
users of libbpf do custom stuff with map definitions without having to
put all this extra logic into libbpf itself, which I think is
desirable outcome.


>
> > I think instead of emulating iproute2 way of matching everything based
> > on user-specified internal IDs, which doesn't provide good user
> > experience and is quite easy to get wrong, we should support same
> > scenarios with better declarative syntax and in a less error-prone
> > way. I believe we can do that by relying on BTF more heavily (again,
> > please check some of my proposals in [0], [1], and discussion with
> > Daniel in those threads). It will feel more natural and be more
> > straightforward to follow. It would be great if you can lend a hand in
> > implementing pieces of that plan!
> >
> > I'm currently on vacation, so my availability is very sparse, but I'd
> > be happy to discuss this further, if need be.
>
> Happy to collaborate on your proposal when you're back from vacation;
> but as I said above, I believe this is a complementary longer-term
> thing...
>
> -Toke
Andrii Nakryiko Aug. 22, 2019, 7:52 a.m. UTC | #6
On Wed, Aug 21, 2019 at 4:29 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>
> > On Tue, Aug 20, 2019 at 01:47:01PM +0200, Toke Høiland-Jørgensen wrote:
> >> iproute2 uses its own bpf loader to load eBPF programs, which has
> >> evolved separately from libbpf. Since we are now standardising on
> >> libbpf, this becomes a problem as iproute2 is slowly accumulating
> >> feature incompatibilities with libbpf-based loaders. In particular,
> >> iproute2 has its own (expanded) version of the map definition struct,
> >> which makes it difficult to write programs that can be loaded with both
> >> custom loaders and iproute2.
> >>
> >> This series seeks to address this by converting iproute2 to using libbpf
> >> for all its bpf needs. This version is an early proof-of-concept RFC, to
> >> get some feedback on whether people think this is the right direction.
> >>
> >> What this series does is the following:
> >>
> >> - Updates the libbpf map definition struct to match that of iproute2
> >>   (patch 1).
> >> - Adds functionality to libbpf to support automatic pinning of maps when
> >>   loading an eBPF program, while re-using pinned maps if they already
> >>   exist (patches 2-3).
> >> - Modifies iproute2 to make it possible to compile it against libbpf
> >>   without affecting any existing functionality (patch 4).
> >> - Changes the iproute2 eBPF loader to use libbpf for loading XDP
> >>   programs (patch 5).
> >>
> >>
> >> As this is an early PoC, there are still a few missing pieces before
> >> this can be merged. Including (but probably not limited to):
> >>
> >> - Consolidate the map definition struct in the bpf_helpers.h file in the
> >>   kernel tree. This contains a different, and incompatible, update to
> >>   the struct. Since the iproute2 version has actually been released for
> >>   use outside the kernel tree (and thus is subject to API stability
> >>   constraints), I think it makes the most sense to keep that, and port
> >>   the selftests to use it.
> >
> > It sounds like you're implying that existing libbpf format is not
> > uapi.
>
> No, that's not what I meant... See below.
>
> > It is and we cannot break it.
> > If patch 1 means breakage for existing pre-compiled .o that won't load
> > with new libbpf then we cannot use this method.
> > Recompiling .o with new libbpf definition of bpf_map_def isn't an option.
> > libbpf has to be smart before/after and recognize both old and iproute2 format.
>
> The libbpf.h definition of struct bpf_map_def is compatible with the one
> used in iproute2. In libbpf.h, the struct only contains five fields
> (type, key_size, value_size, max_entries and flags), and iproute2 adds
> another 4 (id, pinning, inner_id and inner_idx; these are the ones in
> patch 1 in this series).
>
> The issue I was alluding to above is that the bpf_helpers.h file in the
> kernel selftests directory *also* extends the bpf_map_def struct, and
> adds two *different* fields (inner_map_idx and numa_mode). The former is
> used to implement the same map-in-map definition functionality that
> iproute2 has, but with different semantics. The latter is additional to
> that, and I'm planning to add that to this series.
>
> Since bpf_helpers.h is *not* part of libbpf (yet), this will make it

We should start considering it as if it was, so if we can avoid adding
stuff that I'd need to untangle to move it into libbpf, I'd rather
avoid it.
We've already prepared this move by relicensing bpf_helpers.h. Moving
it into libbpf itself is immediate next thing I'll do when I'm back.

> possible to keep API (and ABI) compatibility with both iproute2 and
> libbpf. As in, old .o files will still load with libbpf after this
> series, they just won't be able to use the new automatic pinning
> feature.
>
> -Toke
Daniel Borkmann Aug. 22, 2019, 8:33 a.m. UTC | #7
On 8/22/19 9:49 AM, Andrii Nakryiko wrote:
> On Wed, Aug 21, 2019 at 2:07 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>>> On Tue, Aug 20, 2019 at 4:47 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>>>
>>>> iproute2 uses its own bpf loader to load eBPF programs, which has
>>>> evolved separately from libbpf. Since we are now standardising on
>>>> libbpf, this becomes a problem as iproute2 is slowly accumulating
>>>> feature incompatibilities with libbpf-based loaders. In particular,
>>>> iproute2 has its own (expanded) version of the map definition struct,
>>>> which makes it difficult to write programs that can be loaded with both
>>>> custom loaders and iproute2.
>>>>
>>>> This series seeks to address this by converting iproute2 to using libbpf
>>>> for all its bpf needs. This version is an early proof-of-concept RFC, to
>>>> get some feedback on whether people think this is the right direction.
>>>>
>>>> What this series does is the following:
>>>>
>>>> - Updates the libbpf map definition struct to match that of iproute2
>>>>    (patch 1).
>>>
>>> Thanks for taking a stab at unifying libbpf and iproute2 loaders. I'm
>>> totally in support of making iproute2 use libbpf to load/initialize
>>> BPF programs. But I'm against adding iproute2-specific fields to
>>> libbpf's bpf_map_def definitions to support this.
>>>
>>> I've proposed the plan of extending libbpf's supported features so
>>> that it can be used to load iproute2-style BPF programs earlier,
>>> please see discussions in [0] and [1].
>>
>> Yeah, I've seen that discussion, and agree that longer term this is
>> probably a better way to do map-in-map definitions.
>>
>> However, I view your proposal as complementary to this series: we'll
>> probably also want the BTF-based definition to work with iproute2, and
>> that means iproute2 needs to be ported to libbpf. But iproute2 needs to
>> be backwards compatible with the format it supports now, and, well, this
>> series is the simplest way to achieve that IMO :)
> 
> Ok, I understand that. But I'd still want to avoid adding extra cruft
> to libbpf just for backwards-compatibility with *exact* iproute2
> format. Libbpf as a whole is trying to move away from relying on
> binary bpf_map_def and into using BTF-defined map definitions, and
> this patch series is a step backwards in that regard, that adds,
> essentially, already outdated stuff that we'll need to support forever
> (I mean those extra fields in bpf_map_def, that will stay there
> forever).

Agree, adding these extensions for libbpf would be a step backwards
compared to using BTF defined map defs.

> We've discussed one way to deal with it, IMO, in a cleaner way. It can
> be done in few steps:
> 
> 1. I originally wanted BTF-defined map definitions to ignore unknown
> fields. It shouldn't be a default mode, but it should be supported
> (and of course is very easy to add). So let's add that and let libbpf
> ignore unknown stuff.
> 
> 2. Then to let iproute2 loader deal with backwards-compatibility for
> libbpf-incompatible bpf_elf_map, we need to "pass-through" all those
> fields so that users of libbpf (iproute2 loader, in this case) can
> make use of it. The easiest and cleanest way to do this is to expose
> BTF ID of a type describing each map entry and let iproute2 process
> that in whichever way it sees fit.
> 
> Luckily, bpf_elf_map is compatible in `type` field, which will let
> libbpf recognize bpf_elf_map as map definition. All the rest setup
> will be done by iproute2, by processing BTF of bpf_elf_map, which will
> let it set up map sizes, flags and do all of its map-in-map magic.
> 
> The only additions to libbpf in this case would be a new `__u32
> bpf_map__btf_id(struct bpf_map* map);` API.
> 
> I haven't written any code and haven't 100% checked that this will
> cover everything, but I think we should try. This will allow to let
> users of libbpf do custom stuff with map definitions without having to
> put all this extra logic into libbpf itself, which I think is
> desirable outcome.

Sounds reasonable in general, but all this still has the issue that we're
assuming that BTF is /always/ present. Existing object files that would load
just fine /today/ but do not have BTF attached won't be handled here. Wouldn't
it be more straight forward to allow passing callbacks to the libbpf loader
such that if the map section is not found to be bpf_map_def compatible, we
rely on external user aka callback to parse the ELF section, handle any
non-default libbpf behavior like pinning/retrieving from BPF fs, populate
related internal libbpf map data structures and pass control back to libbpf
loader afterwards. (Similar callback with prog section name handling for the
case where tail call maps get automatically populated.)

Thanks,
Daniel
Toke Høiland-Jørgensen Aug. 22, 2019, 10:38 a.m. UTC | #8
Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Wed, Aug 21, 2019 at 4:29 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>>
>> > On Tue, Aug 20, 2019 at 01:47:01PM +0200, Toke Høiland-Jørgensen wrote:
>> >> iproute2 uses its own bpf loader to load eBPF programs, which has
>> >> evolved separately from libbpf. Since we are now standardising on
>> >> libbpf, this becomes a problem as iproute2 is slowly accumulating
>> >> feature incompatibilities with libbpf-based loaders. In particular,
>> >> iproute2 has its own (expanded) version of the map definition struct,
>> >> which makes it difficult to write programs that can be loaded with both
>> >> custom loaders and iproute2.
>> >>
>> >> This series seeks to address this by converting iproute2 to using libbpf
>> >> for all its bpf needs. This version is an early proof-of-concept RFC, to
>> >> get some feedback on whether people think this is the right direction.
>> >>
>> >> What this series does is the following:
>> >>
>> >> - Updates the libbpf map definition struct to match that of iproute2
>> >>   (patch 1).
>> >> - Adds functionality to libbpf to support automatic pinning of maps when
>> >>   loading an eBPF program, while re-using pinned maps if they already
>> >>   exist (patches 2-3).
>> >> - Modifies iproute2 to make it possible to compile it against libbpf
>> >>   without affecting any existing functionality (patch 4).
>> >> - Changes the iproute2 eBPF loader to use libbpf for loading XDP
>> >>   programs (patch 5).
>> >>
>> >>
>> >> As this is an early PoC, there are still a few missing pieces before
>> >> this can be merged. Including (but probably not limited to):
>> >>
>> >> - Consolidate the map definition struct in the bpf_helpers.h file in the
>> >>   kernel tree. This contains a different, and incompatible, update to
>> >>   the struct. Since the iproute2 version has actually been released for
>> >>   use outside the kernel tree (and thus is subject to API stability
>> >>   constraints), I think it makes the most sense to keep that, and port
>> >>   the selftests to use it.
>> >
>> > It sounds like you're implying that existing libbpf format is not
>> > uapi.
>>
>> No, that's not what I meant... See below.
>>
>> > It is and we cannot break it.
>> > If patch 1 means breakage for existing pre-compiled .o that won't load
>> > with new libbpf then we cannot use this method.
>> > Recompiling .o with new libbpf definition of bpf_map_def isn't an option.
>> > libbpf has to be smart before/after and recognize both old and iproute2 format.
>>
>> The libbpf.h definition of struct bpf_map_def is compatible with the one
>> used in iproute2. In libbpf.h, the struct only contains five fields
>> (type, key_size, value_size, max_entries and flags), and iproute2 adds
>> another 4 (id, pinning, inner_id and inner_idx; these are the ones in
>> patch 1 in this series).
>>
>> The issue I was alluding to above is that the bpf_helpers.h file in the
>> kernel selftests directory *also* extends the bpf_map_def struct, and
>> adds two *different* fields (inner_map_idx and numa_mode). The former is
>> used to implement the same map-in-map definition functionality that
>> iproute2 has, but with different semantics. The latter is additional to
>> that, and I'm planning to add that to this series.
>>
>> Since bpf_helpers.h is *not* part of libbpf (yet), this will make it
>
> We should start considering it as if it was, so if we can avoid adding
> stuff that I'd need to untangle to move it into libbpf, I'd rather
> avoid it.
> We've already prepared this move by relicensing bpf_helpers.h. Moving
> it into libbpf itself is immediate next thing I'll do when I'm back.

Yeah, I figured that with the relicensing, bpf_helpers would probably be
making its way into libbpf soon. Which is why I wanted to start this
discussion before that: If we do move bpf_helpers as-is, that will put
us in the territory of full-on binary incompatibility. So the time to
discuss doing this in a compatible way is now, before any such move is
made.

-Toke
Toke Høiland-Jørgensen Aug. 22, 2019, 11:48 a.m. UTC | #9
Daniel Borkmann <daniel@iogearbox.net> writes:

> On 8/22/19 9:49 AM, Andrii Nakryiko wrote:
>> On Wed, Aug 21, 2019 at 2:07 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>>>> On Tue, Aug 20, 2019 at 4:47 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>>>>
>>>>> iproute2 uses its own bpf loader to load eBPF programs, which has
>>>>> evolved separately from libbpf. Since we are now standardising on
>>>>> libbpf, this becomes a problem as iproute2 is slowly accumulating
>>>>> feature incompatibilities with libbpf-based loaders. In particular,
>>>>> iproute2 has its own (expanded) version of the map definition struct,
>>>>> which makes it difficult to write programs that can be loaded with both
>>>>> custom loaders and iproute2.
>>>>>
>>>>> This series seeks to address this by converting iproute2 to using libbpf
>>>>> for all its bpf needs. This version is an early proof-of-concept RFC, to
>>>>> get some feedback on whether people think this is the right direction.
>>>>>
>>>>> What this series does is the following:
>>>>>
>>>>> - Updates the libbpf map definition struct to match that of iproute2
>>>>>    (patch 1).
>>>>
>>>> Thanks for taking a stab at unifying libbpf and iproute2 loaders. I'm
>>>> totally in support of making iproute2 use libbpf to load/initialize
>>>> BPF programs. But I'm against adding iproute2-specific fields to
>>>> libbpf's bpf_map_def definitions to support this.
>>>>
>>>> I've proposed the plan of extending libbpf's supported features so
>>>> that it can be used to load iproute2-style BPF programs earlier,
>>>> please see discussions in [0] and [1].
>>>
>>> Yeah, I've seen that discussion, and agree that longer term this is
>>> probably a better way to do map-in-map definitions.
>>>
>>> However, I view your proposal as complementary to this series: we'll
>>> probably also want the BTF-based definition to work with iproute2, and
>>> that means iproute2 needs to be ported to libbpf. But iproute2 needs to
>>> be backwards compatible with the format it supports now, and, well, this
>>> series is the simplest way to achieve that IMO :)
>> 
>> Ok, I understand that. But I'd still want to avoid adding extra cruft
>> to libbpf just for backwards-compatibility with *exact* iproute2
>> format. Libbpf as a whole is trying to move away from relying on
>> binary bpf_map_def and into using BTF-defined map definitions, and
>> this patch series is a step backwards in that regard, that adds,
>> essentially, already outdated stuff that we'll need to support forever
>> (I mean those extra fields in bpf_map_def, that will stay there
>> forever).
>
> Agree, adding these extensions for libbpf would be a step backwards
> compared to using BTF defined map defs.
>
>> We've discussed one way to deal with it, IMO, in a cleaner way. It can
>> be done in few steps:
>> 
>> 1. I originally wanted BTF-defined map definitions to ignore unknown
>> fields. It shouldn't be a default mode, but it should be supported
>> (and of course is very easy to add). So let's add that and let libbpf
>> ignore unknown stuff.
>> 
>> 2. Then to let iproute2 loader deal with backwards-compatibility for
>> libbpf-incompatible bpf_elf_map, we need to "pass-through" all those
>> fields so that users of libbpf (iproute2 loader, in this case) can
>> make use of it. The easiest and cleanest way to do this is to expose
>> BTF ID of a type describing each map entry and let iproute2 process
>> that in whichever way it sees fit.
>> 
>> Luckily, bpf_elf_map is compatible in `type` field, which will let
>> libbpf recognize bpf_elf_map as map definition. All the rest setup
>> will be done by iproute2, by processing BTF of bpf_elf_map, which will
>> let it set up map sizes, flags and do all of its map-in-map magic.
>> 
>> The only additions to libbpf in this case would be a new `__u32
>> bpf_map__btf_id(struct bpf_map* map);` API.
>> 
>> I haven't written any code and haven't 100% checked that this will
>> cover everything, but I think we should try. This will allow to let
>> users of libbpf do custom stuff with map definitions without having to
>> put all this extra logic into libbpf itself, which I think is
>> desirable outcome.
>
> Sounds reasonable in general, but all this still has the issue that
> we're assuming that BTF is /always/ present. Existing object files
> that would load just fine /today/ but do not have BTF attached won't
> be handled here. Wouldn't it be more straight forward to allow passing
> callbacks to the libbpf loader such that if the map section is not
> found to be bpf_map_def compatible, we rely on external user aka
> callback to parse the ELF section, handle any non-default libbpf
> behavior like pinning/retrieving from BPF fs, populate related
> internal libbpf map data structures and pass control back to libbpf
> loader afterwards. (Similar callback with prog section name handling
> for the case where tail call maps get automatically populated.)

Thinking about this some more, I think there are two separate issues
here:

1. Do we want libbpf to support the features currently in iproute2 and
   bpf_helpers (i.e., map pinning + reuse, map-in-map definitions, and
   NUMA node placement of maps). IMO the answer to this is yes.

2. What should the data format be for BPF programs to signal that they
   want to use those features? Here, the longer-term answer is BTF-based
   map definitions, but we still want iproute2 to be backwards
   compatible.


So how about I revise this patch series to implement the *features* (I
already implemented map-in-map and numa nodes[0], so that is sorta
already done), but instead of extending the bpf_map_def struct, I just
expose callbacks that will allow programs to fill in internal-to-libbpf
data structures with the required information. Then, once the BTF-based
map definition does land, that can simply define default callbacks that
uses the BTF information to fill in those same internal data structures?

This would mean no extending bpf_map_def, and relaxing the current
libbpf restriction on extending bpf_map_def.

The drawback of this approach is that it does nothing to combat
fragmentation: People building their own loaders can still reimplement
different semantics for map defs, leading to programs that are tied to a
particular loader. So this would only work if we really believe BTF can
save us from this. I don't feel competent to comment on this just yet,
but thought I'd mention it :)

-Toke
Toke Høiland-Jørgensen Aug. 22, 2019, 11:49 a.m. UTC | #10
Toke Høiland-Jørgensen <toke@redhat.com> writes:

> Daniel Borkmann <daniel@iogearbox.net> writes:
>
>> On 8/22/19 9:49 AM, Andrii Nakryiko wrote:
>>> On Wed, Aug 21, 2019 at 2:07 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>>>>> On Tue, Aug 20, 2019 at 4:47 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>>>>>
>>>>>> iproute2 uses its own bpf loader to load eBPF programs, which has
>>>>>> evolved separately from libbpf. Since we are now standardising on
>>>>>> libbpf, this becomes a problem as iproute2 is slowly accumulating
>>>>>> feature incompatibilities with libbpf-based loaders. In particular,
>>>>>> iproute2 has its own (expanded) version of the map definition struct,
>>>>>> which makes it difficult to write programs that can be loaded with both
>>>>>> custom loaders and iproute2.
>>>>>>
>>>>>> This series seeks to address this by converting iproute2 to using libbpf
>>>>>> for all its bpf needs. This version is an early proof-of-concept RFC, to
>>>>>> get some feedback on whether people think this is the right direction.
>>>>>>
>>>>>> What this series does is the following:
>>>>>>
>>>>>> - Updates the libbpf map definition struct to match that of iproute2
>>>>>>    (patch 1).
>>>>>
>>>>> Thanks for taking a stab at unifying libbpf and iproute2 loaders. I'm
>>>>> totally in support of making iproute2 use libbpf to load/initialize
>>>>> BPF programs. But I'm against adding iproute2-specific fields to
>>>>> libbpf's bpf_map_def definitions to support this.
>>>>>
>>>>> I've proposed the plan of extending libbpf's supported features so
>>>>> that it can be used to load iproute2-style BPF programs earlier,
>>>>> please see discussions in [0] and [1].
>>>>
>>>> Yeah, I've seen that discussion, and agree that longer term this is
>>>> probably a better way to do map-in-map definitions.
>>>>
>>>> However, I view your proposal as complementary to this series: we'll
>>>> probably also want the BTF-based definition to work with iproute2, and
>>>> that means iproute2 needs to be ported to libbpf. But iproute2 needs to
>>>> be backwards compatible with the format it supports now, and, well, this
>>>> series is the simplest way to achieve that IMO :)
>>> 
>>> Ok, I understand that. But I'd still want to avoid adding extra cruft
>>> to libbpf just for backwards-compatibility with *exact* iproute2
>>> format. Libbpf as a whole is trying to move away from relying on
>>> binary bpf_map_def and into using BTF-defined map definitions, and
>>> this patch series is a step backwards in that regard, that adds,
>>> essentially, already outdated stuff that we'll need to support forever
>>> (I mean those extra fields in bpf_map_def, that will stay there
>>> forever).
>>
>> Agree, adding these extensions for libbpf would be a step backwards
>> compared to using BTF defined map defs.
>>
>>> We've discussed one way to deal with it, IMO, in a cleaner way. It can
>>> be done in few steps:
>>> 
>>> 1. I originally wanted BTF-defined map definitions to ignore unknown
>>> fields. It shouldn't be a default mode, but it should be supported
>>> (and of course is very easy to add). So let's add that and let libbpf
>>> ignore unknown stuff.
>>> 
>>> 2. Then to let iproute2 loader deal with backwards-compatibility for
>>> libbpf-incompatible bpf_elf_map, we need to "pass-through" all those
>>> fields so that users of libbpf (iproute2 loader, in this case) can
>>> make use of it. The easiest and cleanest way to do this is to expose
>>> BTF ID of a type describing each map entry and let iproute2 process
>>> that in whichever way it sees fit.
>>> 
>>> Luckily, bpf_elf_map is compatible in `type` field, which will let
>>> libbpf recognize bpf_elf_map as map definition. All the rest setup
>>> will be done by iproute2, by processing BTF of bpf_elf_map, which will
>>> let it set up map sizes, flags and do all of its map-in-map magic.
>>> 
>>> The only additions to libbpf in this case would be a new `__u32
>>> bpf_map__btf_id(struct bpf_map* map);` API.
>>> 
>>> I haven't written any code and haven't 100% checked that this will
>>> cover everything, but I think we should try. This will allow to let
>>> users of libbpf do custom stuff with map definitions without having to
>>> put all this extra logic into libbpf itself, which I think is
>>> desirable outcome.
>>
>> Sounds reasonable in general, but all this still has the issue that
>> we're assuming that BTF is /always/ present. Existing object files
>> that would load just fine /today/ but do not have BTF attached won't
>> be handled here. Wouldn't it be more straight forward to allow passing
>> callbacks to the libbpf loader such that if the map section is not
>> found to be bpf_map_def compatible, we rely on external user aka
>> callback to parse the ELF section, handle any non-default libbpf
>> behavior like pinning/retrieving from BPF fs, populate related
>> internal libbpf map data structures and pass control back to libbpf
>> loader afterwards. (Similar callback with prog section name handling
>> for the case where tail call maps get automatically populated.)
>
> Thinking about this some more, I think there are two separate issues
> here:
>
> 1. Do we want libbpf to support the features currently in iproute2 and
>    bpf_helpers (i.e., map pinning + reuse, map-in-map definitions, and
>    NUMA node placement of maps). IMO the answer to this is yes.
>
> 2. What should the data format be for BPF programs to signal that they
>    want to use those features? Here, the longer-term answer is BTF-based
>    map definitions, but we still want iproute2 to be backwards
>    compatible.
>
>
> So how about I revise this patch series to implement the *features* (I
> already implemented map-in-map and numa nodes[0], so that is sorta
> already done), but instead of extending the bpf_map_def struct, I just
> expose callbacks that will allow programs to fill in internal-to-libbpf
> data structures with the required information. Then, once the BTF-based
> map definition does land, that can simply define default callbacks that
> uses the BTF information to fill in those same internal data structures?
>
> This would mean no extending bpf_map_def, and relaxing the current
> libbpf restriction on extending bpf_map_def.
>
> The drawback of this approach is that it does nothing to combat
> fragmentation: People building their own loaders can still reimplement
> different semantics for map defs, leading to programs that are tied to a
> particular loader. So this would only work if we really believe BTF can
> save us from this. I don't feel competent to comment on this just yet,
> but thought I'd mention it :)
>
> -Toke

[0] was supposed to be a reference to
    https://github.com/tohojo/libbpf/tree/iproute2-compat
Andrii Nakryiko Aug. 23, 2019, 6:31 a.m. UTC | #11
On Thu, Aug 22, 2019 at 1:33 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 8/22/19 9:49 AM, Andrii Nakryiko wrote:
> > On Wed, Aug 21, 2019 at 2:07 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> >>> On Tue, Aug 20, 2019 at 4:47 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>>>
> >>>> iproute2 uses its own bpf loader to load eBPF programs, which has
> >>>> evolved separately from libbpf. Since we are now standardising on
> >>>> libbpf, this becomes a problem as iproute2 is slowly accumulating
> >>>> feature incompatibilities with libbpf-based loaders. In particular,
> >>>> iproute2 has its own (expanded) version of the map definition struct,
> >>>> which makes it difficult to write programs that can be loaded with both
> >>>> custom loaders and iproute2.
> >>>>
> >>>> This series seeks to address this by converting iproute2 to using libbpf
> >>>> for all its bpf needs. This version is an early proof-of-concept RFC, to
> >>>> get some feedback on whether people think this is the right direction.
> >>>>
> >>>> What this series does is the following:
> >>>>
> >>>> - Updates the libbpf map definition struct to match that of iproute2
> >>>>    (patch 1).
> >>>
> >>> Thanks for taking a stab at unifying libbpf and iproute2 loaders. I'm
> >>> totally in support of making iproute2 use libbpf to load/initialize
> >>> BPF programs. But I'm against adding iproute2-specific fields to
> >>> libbpf's bpf_map_def definitions to support this.
> >>>
> >>> I've proposed the plan of extending libbpf's supported features so
> >>> that it can be used to load iproute2-style BPF programs earlier,
> >>> please see discussions in [0] and [1].
> >>
> >> Yeah, I've seen that discussion, and agree that longer term this is
> >> probably a better way to do map-in-map definitions.
> >>
> >> However, I view your proposal as complementary to this series: we'll
> >> probably also want the BTF-based definition to work with iproute2, and
> >> that means iproute2 needs to be ported to libbpf. But iproute2 needs to
> >> be backwards compatible with the format it supports now, and, well, this
> >> series is the simplest way to achieve that IMO :)
> >
> > Ok, I understand that. But I'd still want to avoid adding extra cruft
> > to libbpf just for backwards-compatibility with *exact* iproute2
> > format. Libbpf as a whole is trying to move away from relying on
> > binary bpf_map_def and into using BTF-defined map definitions, and
> > this patch series is a step backwards in that regard, that adds,
> > essentially, already outdated stuff that we'll need to support forever
> > (I mean those extra fields in bpf_map_def, that will stay there
> > forever).
>
> Agree, adding these extensions for libbpf would be a step backwards
> compared to using BTF defined map defs.
>
> > We've discussed one way to deal with it, IMO, in a cleaner way. It can
> > be done in few steps:
> >
> > 1. I originally wanted BTF-defined map definitions to ignore unknown
> > fields. It shouldn't be a default mode, but it should be supported
> > (and of course is very easy to add). So let's add that and let libbpf
> > ignore unknown stuff.
> >
> > 2. Then to let iproute2 loader deal with backwards-compatibility for
> > libbpf-incompatible bpf_elf_map, we need to "pass-through" all those
> > fields so that users of libbpf (iproute2 loader, in this case) can
> > make use of it. The easiest and cleanest way to do this is to expose
> > BTF ID of a type describing each map entry and let iproute2 process
> > that in whichever way it sees fit.
> >
> > Luckily, bpf_elf_map is compatible in `type` field, which will let
> > libbpf recognize bpf_elf_map as map definition. All the rest setup
> > will be done by iproute2, by processing BTF of bpf_elf_map, which will
> > let it set up map sizes, flags and do all of its map-in-map magic.
> >
> > The only additions to libbpf in this case would be a new `__u32
> > bpf_map__btf_id(struct bpf_map* map);` API.
> >
> > I haven't written any code and haven't 100% checked that this will
> > cover everything, but I think we should try. This will allow to let
> > users of libbpf do custom stuff with map definitions without having to
> > put all this extra logic into libbpf itself, which I think is
> > desirable outcome.
>
> Sounds reasonable in general, but all this still has the issue that we're
> assuming that BTF is /always/ present. Existing object files that would load
> just fine /today/ but do not have BTF attached won't be handled here. Wouldn't
> it be more straight forward to allow passing callbacks to the libbpf loader
> such that if the map section is not found to be bpf_map_def compatible, we
> rely on external user aka callback to parse the ELF section, handle any
> non-default libbpf behavior like pinning/retrieving from BPF fs, populate
> related internal libbpf map data structures and pass control back to libbpf

Having all those special callbacks feels a bit too narrow-focused. I
do agree that we need to provide enough flexibility to allow
non-standard BPF loaders like iproute2 to adjust and/or extend some of
BPF initialization logic, though. I'm just unsure if adding more and
more callbacks is the right approach.

Let's think slightly beyond iproute2, because iproute2 and libbpf use
the same ELF section name ("maps") for non-BTF-defined map defs, which
makes it easy to fall into the trap of designing too specific
solution. Let's take, say, BCC. BCC uses one section per each map.
And, unlike, iproute2, ELF section names don't coincide. So what if
BCC were to use libbpf as an underlying BPF loader, while preserving
its layout? This callback for "maps" section won't work at all.

BTW, I realize BCC might not be the best example here, given
on-the-fly complication nature of its programs and extensive usage of
macro, which provide quite a lot of flexibility in changing ELF
layout, if necessary, without changing user programs, but bear with
me, let's pretend we can't change some of those aspects easily. The
point I'm trying to make is that iproute2 and libbpf cases are
examples of almost compatible ELF layouts, and if we try to solve just
such case, we might end up inventing too specific solution.

What seems like a bit more flexible and generic solution is to make
libbpf API granular enough to allow users to adjust/add extra maps,
programs, relocations, whatever is necessary, before libbpf does final
loading or some other checks that would fail for normal libbpf
programs, but are ok for custom BPF programs. So instead of having
callbacks, we have API for each step, where custom loader can skip
and/or adjust behavior of each of them, while also implement their own
steps.

E.g., today's API is essentially three steps:

1. open and parse ELF: collect relos, programs, map definitions
2. load: create maps from collected defs, do program/global data/CO-RE
relocs, load and verify BPF programs
3. attach programs one by one.

Between step 1 and 2 user has flexibility to create more maps, set up
map-in-map, etc. Between 2 and 3 you can fill in global data, fill in
tail call maps, etc. That's already pretty flexible. But we can tune
and break apart those steps even further, if necessary.

E.g., for iproute2 maps, I think the biggest problem is that we can't
create a custom map dynamically, but still allow BPF instructions
relocations against that map. Plus, of course, a name clash of "maps"
ELF section with incompatible map definitions. So how about:

1. for open call, tell it to not parse "maps" section at all. We can
model that as an extra option to override map definitions ELF section
name. E.g., so that some applications can put their map defs into
"my_fancy_map_def_section" ELF section, for instance. If that section
is empty, though, libbpf will just skip step of collecting
bpf_map_defs (it can still do BTF-defined maps, btw), allowing gradual
migration.

2. provide API to add dynamically created (e.g., by iproute2 loader)
maps that are "relocatable to", before program relocations happen.
We'll need to figure out best interface here. Internally relocations
refer to maps as section index + offset and translate that to map
index. If we keep relos (internally) just as section index + offset,
it will be simple and consistent interface to add external maps that
can be relocated against, IMO. Map index is inconvenient in this case.

BTW, we'd need to answer some of those questions regardless, even for
callback-based solution. There are a bunch of details to be worked
out, of course, and I don't have exact answer to all of them right
now, but I think it's a worthwhile exercise to try to answer them and
see how API would look like.

As an aside, taking a step back and thinking about this whole API
design thing, it occurred to me that ELF is just one way to specify
programs, maps, relocations, global data, etc. But it doesn't have to
be just ELF, right? What if we have a use case where we have
"on-the-fly" creation of BPF program, with dynamically added maps,
dynamically generated BPF programs, etc. E.g., think about bpftrace
generating BPF object/program on-the-fly from their DSL language. Or
some pcap to BPF translator for firewall rules, etc, etc. It might be
inconvenient and unnecessary to generate ELF and then pass it to
libbpf to load it. Instead it could be more convenient to create an
empty bpf_object (bpf_object__new) and then use programmatic APIs to
add a bunch of maps (some sort of bpf_object__add_map) and progs
(bpf_object__add_program) with relocations against those maps, and let
libbpf do all the low-level stuff (relos, CO-RE, etc). And provide
high-level API for bpf_map, bpf_program, etc.

How API would look like to support this? Today's bpf_object__open +
bpf__object__load could be just a higher-level wrappers on top of
those APIs, constructing all the entities from standardized ELF
layout. But there still is low-level API to construct same bpf_object
construct dynamically. That seems flexible and powerful and not tied
to any particular use case, but of course requires a bit of thought
about best API.

Sorry for the wall of text :) Callback-based solutions always seem
convoluted to me, as well as hard to follow and quite often too
limited. This topic is not the easiest one to discuss over email, as
well, maybe we should chat about this while at LPC?

> loader afterwards. (Similar callback with prog section name handling for the
> case where tail call maps get automatically populated.)

I'm not sure I completely understand why we need this prog section
name callback. Can you elaborate what problem does it solve that can't
be solved with existing API?

>
> Thanks,
> Daniel
Jesper Dangaard Brouer Aug. 23, 2019, 10:27 a.m. UTC | #12
On Wed, 21 Aug 2019 13:30:09 -0700
Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:

> On Tue, Aug 20, 2019 at 4:47 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >
> > iproute2 uses its own bpf loader to load eBPF programs, which has
> > evolved separately from libbpf. Since we are now standardising on
> > libbpf, this becomes a problem as iproute2 is slowly accumulating
> > feature incompatibilities with libbpf-based loaders. In particular,
> > iproute2 has its own (expanded) version of the map definition struct,
> > which makes it difficult to write programs that can be loaded with both
> > custom loaders and iproute2.
> >
> > This series seeks to address this by converting iproute2 to using libbpf
> > for all its bpf needs. This version is an early proof-of-concept RFC, to
> > get some feedback on whether people think this is the right direction.
> >
> > What this series does is the following:
> >
> > - Updates the libbpf map definition struct to match that of iproute2
> >   (patch 1).  
> 
> 
> Hi Toke,
> 
> Thanks for taking a stab at unifying libbpf and iproute2 loaders. I'm
> totally in support of making iproute2 use libbpf to load/initialize
> BPF programs. But I'm against adding iproute2-specific fields to
> libbpf's bpf_map_def definitions to support this.
> 
> I've proposed the plan of extending libbpf's supported features so
> that it can be used to load iproute2-style BPF programs earlier,
> please see discussions in [0] and [1]. I think instead of emulating
> iproute2 way of matching everything based on user-specified internal
> IDs, which doesn't provide good user experience and is quite easy to
> get wrong, we should support same scenarios with better declarative
> syntax and in a less error-prone way. I believe we can do that by
> relying on BTF more heavily (again, please check some of my proposals
> in [0], [1], and discussion with Daniel in those threads). It will
> feel more natural and be more straightforward to follow. It would be
> great if you can lend a hand in implementing pieces of that plan!
> 
> I'm currently on vacation, so my availability is very sparse, but I'd
> be happy to discuss this further, if need be.
> 
>   [0] https://lore.kernel.org/bpf/CAEf4BzbfdG2ub7gCi0OYqBrUoChVHWsmOntWAkJt47=FE+km+A@mail.gmail.com/
>   [1] https://www.spinics.net/lists/bpf/msg03976.html
> 
> > - Adds functionality to libbpf to support automatic pinning of maps when
> >   loading an eBPF program, while re-using pinned maps if they already
> >   exist (patches 2-3).

For production use-cases, libbpf really need an easier higher-level API
for re-using pinned maps, for establishing shared maps between
programs.  The existing libbpf API bpf_object__pin_maps() and
bpf_object__unpin_maps(), which don't re-use pinned maps, are not
really usable, because they pin/unpin ALL maps in the ELF file.

What users really need is an easy way to specify, on a per map basis,
what kind of pinning and reuse/sharing they want.  E.g. like iproute2
have, "global", "object-scope", and "no-pinning". ("ifindex-scope" would
be nice for XDP).
  Today users have to split/reimplement bpf_prog_load_xattr(), and
use/add bpf_map__reuse_fd().  Which is that I ended doing for
xdp-cpumap-tc[2] (used in production at ISP) resulting in 142 lines of
extra code[3] that should have been hidden inside libbpf.  And worse,
in this solution[4] the maps for reuse-pinning is specified in the code
by name.  Thus, they cannot use a generic loader.  That I why, I want
to mark the maps via a pinning member, like iproute2.

I really hope this moves in a practical direction, as I have the next
production request lined up (also from an ISP), and I hate to have to
advice them to choose the same route as [3].


[2] https://github.com/xdp-project/xdp-cpumap-tc/
[3] https://github.com/xdp-project/xdp-cpumap-tc/blob/master/src/xdp_iphash_to_cpu_user.c#L262-L403
[4] https://github.com/xdp-project/xdp-cpumap-tc/blob/master/src/xdp_iphash_to_cpu_user.c#L431-L441
Toke Høiland-Jørgensen Aug. 23, 2019, 11:29 a.m. UTC | #13
[ ... snip ...]

> E.g., today's API is essentially three steps:
>
> 1. open and parse ELF: collect relos, programs, map definitions
> 2. load: create maps from collected defs, do program/global data/CO-RE
> relocs, load and verify BPF programs
> 3. attach programs one by one.
>
> Between step 1 and 2 user has flexibility to create more maps, set up
> map-in-map, etc. Between 2 and 3 you can fill in global data, fill in
> tail call maps, etc. That's already pretty flexible. But we can tune
> and break apart those steps even further, if necessary.

Today, steps 1 and 2 can be collapsed into a single call to
bpf_prog_load_xattr(). As Jesper's mail explains, for XDP we don't
generally want to do all the fancy rewriting stuff, we just want a
simple way to load a program and get reusable pinning of maps.
Preferably in a way that is compatible with the iproute2 loader.

So I really think we need two things:

(1) a flexible API that splits up all the various steps in a way that
    allows programs to inject their own map definitions before
    relocations and loading

(2) a simple convenience wrapper that loads an object file, does
    something sensible with pinning and map-in-map definitions, and loads
    everything into the kernel.

I'd go so far as to say that (2) should even support system-wide
configuration, similar to the /etc/iproute2/bpf_pinning file. E.g., an
/etc/libbpf/pinning.conf file that sets the default pinning directory,
and makes it possible to set up pin-value-to-subdir mappings like what
iproute2 does today.

Having (2) makes it more likely that all the different custom loaders
will be compatible with each other, while still allowing people to do
their own custom thing with (1). And of course, (2) could be implemented
in terms of (1) internally in libbpf.

In my ideal world, (2) would just use the definition format already in
iproute2 (this is basically what I implemented already), but if you guys
don't want to put this into libbpf, I can probably live with the default
format being BTF-based instead. Which would mean that iproute2 I would
end up with a flow like this:

- When given an elf file, try to run it through the "standard loader"
  (2). If this works, great, proceed to program attach.

- If using (2) fails because it doesn't understand the map definition,
  fall back to a compatibility loader that parses the legacy iproute2
  map definition format and uses (1) to load that.


Does the above make sense? :)

-Toke
Andrii Nakryiko Aug. 28, 2019, 8:23 p.m. UTC | #14
On Fri, Aug 23, 2019 at 3:27 AM Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
>
> On Wed, 21 Aug 2019 13:30:09 -0700
> Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>
> > On Tue, Aug 20, 2019 at 4:47 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> > >
> > > iproute2 uses its own bpf loader to load eBPF programs, which has
> > > evolved separately from libbpf. Since we are now standardising on
> > > libbpf, this becomes a problem as iproute2 is slowly accumulating
> > > feature incompatibilities with libbpf-based loaders. In particular,
> > > iproute2 has its own (expanded) version of the map definition struct,
> > > which makes it difficult to write programs that can be loaded with both
> > > custom loaders and iproute2.
> > >
> > > This series seeks to address this by converting iproute2 to using libbpf
> > > for all its bpf needs. This version is an early proof-of-concept RFC, to
> > > get some feedback on whether people think this is the right direction.
> > >
> > > What this series does is the following:
> > >
> > > - Updates the libbpf map definition struct to match that of iproute2
> > >   (patch 1).
> >
> >
> > Hi Toke,
> >
> > Thanks for taking a stab at unifying libbpf and iproute2 loaders. I'm
> > totally in support of making iproute2 use libbpf to load/initialize
> > BPF programs. But I'm against adding iproute2-specific fields to
> > libbpf's bpf_map_def definitions to support this.
> >
> > I've proposed the plan of extending libbpf's supported features so
> > that it can be used to load iproute2-style BPF programs earlier,
> > please see discussions in [0] and [1]. I think instead of emulating
> > iproute2 way of matching everything based on user-specified internal
> > IDs, which doesn't provide good user experience and is quite easy to
> > get wrong, we should support same scenarios with better declarative
> > syntax and in a less error-prone way. I believe we can do that by
> > relying on BTF more heavily (again, please check some of my proposals
> > in [0], [1], and discussion with Daniel in those threads). It will
> > feel more natural and be more straightforward to follow. It would be
> > great if you can lend a hand in implementing pieces of that plan!
> >
> > I'm currently on vacation, so my availability is very sparse, but I'd
> > be happy to discuss this further, if need be.
> >
> >   [0] https://lore.kernel.org/bpf/CAEf4BzbfdG2ub7gCi0OYqBrUoChVHWsmOntWAkJt47=FE+km+A@mail.gmail.com/
> >   [1] https://www.spinics.net/lists/bpf/msg03976.html
> >
> > > - Adds functionality to libbpf to support automatic pinning of maps when
> > >   loading an eBPF program, while re-using pinned maps if they already
> > >   exist (patches 2-3).
>
> For production use-cases, libbpf really need an easier higher-level API
> for re-using pinned maps, for establishing shared maps between
> programs.  The existing libbpf API bpf_object__pin_maps() and
> bpf_object__unpin_maps(), which don't re-use pinned maps, are not
> really usable, because they pin/unpin ALL maps in the ELF file.
>
> What users really need is an easy way to specify, on a per map basis,
> what kind of pinning and reuse/sharing they want.  E.g. like iproute2
> have, "global", "object-scope", and "no-pinning". ("ifindex-scope" would
> be nice for XDP).

I totally agree and I think this is easy to add both for BTF-defined
and "classic" bpf_map_def maps. Daniel mentioned in one of the
previous threads that in practice object-scope doesn't seem to be
used, so I'd say we should start with no-pinning + global pinning as
two initial supported values for pinning attribute. ifindex-scope is
interesting, but I'd love to hear a bit more about the use cases.

>   Today users have to split/reimplement bpf_prog_load_xattr(), and
> use/add bpf_map__reuse_fd().  Which is that I ended doing for

Honestly, bpf_prog_load_xattr() existence seems redundant to me. It's
basically just bpf_object__open + bpf_object__load. There is a piece
in the middle with "guessing" program types, but it should just be
moved into bpf_object__open and happen by default. Using open + load
gives more control and isn't really harder than bpf_prog_load_xattr.
bpf_prog_load_xattr which might be slightly more convenient for simple
use case, but falls apart immediately if you need to tune anything
before load.

> xdp-cpumap-tc[2] (used in production at ISP) resulting in 142 lines of
> extra code[3] that should have been hidden inside libbpf.  And worse,
> in this solution[4] the maps for reuse-pinning is specified in the code
> by name.  Thus, they cannot use a generic loader.  That I why, I want
> to mark the maps via a pinning member, like iproute2.
>
> I really hope this moves in a practical direction, as I have the next
> production request lined up (also from an ISP), and I hate to have to
> advice them to choose the same route as [3].

It seems to me that map pinning doesn't need much discussion at this
point, let's start with no-pinning + global pinning. To accommodate
pinning at custom root, bpf_object__open_xattr should accept extra
argument with non-default pinning root path. That should solve your
case completely, shouldn't it? Ultimately, with BTF-defined maps it
should be possible to specify custom pinning path on per-map basis for
cases where user needs ultimate non-uniform manual control.

>
>
> [2] https://github.com/xdp-project/xdp-cpumap-tc/
> [3] https://github.com/xdp-project/xdp-cpumap-tc/blob/master/src/xdp_iphash_to_cpu_user.c#L262-L403
> [4] https://github.com/xdp-project/xdp-cpumap-tc/blob/master/src/xdp_iphash_to_cpu_user.c#L431-L441
> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer
Andrii Nakryiko Aug. 28, 2019, 8:40 p.m. UTC | #15
On Fri, Aug 23, 2019 at 4:29 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> [ ... snip ...]
>
> > E.g., today's API is essentially three steps:
> >
> > 1. open and parse ELF: collect relos, programs, map definitions
> > 2. load: create maps from collected defs, do program/global data/CO-RE
> > relocs, load and verify BPF programs
> > 3. attach programs one by one.
> >
> > Between step 1 and 2 user has flexibility to create more maps, set up
> > map-in-map, etc. Between 2 and 3 you can fill in global data, fill in
> > tail call maps, etc. That's already pretty flexible. But we can tune
> > and break apart those steps even further, if necessary.
>
> Today, steps 1 and 2 can be collapsed into a single call to
> bpf_prog_load_xattr(). As Jesper's mail explains, for XDP we don't
> generally want to do all the fancy rewriting stuff, we just want a
> simple way to load a program and get reusable pinning of maps.

I agree. See my response to Jesper's message. Note also my view of
bpf_prog_load_xattr() existence.

> Preferably in a way that is compatible with the iproute2 loader.
>
> So I really think we need two things:
>
> (1) a flexible API that splits up all the various steps in a way that
>     allows programs to inject their own map definitions before
>     relocations and loading
>
> (2) a simple convenience wrapper that loads an object file, does
>     something sensible with pinning and map-in-map definitions, and loads
>     everything into the kernel.

I agree. I think this wrapper is bpf_object__open + bpf_object__load
(bpf_prog_load_xattr will do as well, if you don't need to do anything
between open and load). I think pinning is simple to add in minimal
form and is pretty non-controversial (there is some ambiguity as to
how to handle merging of prog array maps, or maybe not just prog array
maps, but that can be controlled later through extra flags/attributes,
so I'd start with something sensible as a default behavior).

>
> I'd go so far as to say that (2) should even support system-wide
> configuration, similar to the /etc/iproute2/bpf_pinning file. E.g., an
> /etc/libbpf/pinning.conf file that sets the default pinning directory,
> and makes it possible to set up pin-value-to-subdir mappings like what
> iproute2 does today.

This I'm a bit hesitant about. It feels like it's not library's job to
read some system-wide configs modifying its behavior. We have all
those _xattr methods, which allow to override sensible defaults, I'd
try to go as far as possible with just that before doing
libbpf-specific /etc configs.

>
> Having (2) makes it more likely that all the different custom loaders
> will be compatible with each other, while still allowing people to do
> their own custom thing with (1). And of course, (2) could be implemented
> in terms of (1) internally in libbpf.
>
> In my ideal world, (2) would just use the definition format already in
> iproute2 (this is basically what I implemented already), but if you guys
> don't want to put this into libbpf, I can probably live with the default

I want to avoid having legacy-at-the-time-it-was-added code in libbpf
that we'd need to support for a long time, that solves only iproute2
cases, which is why I'm pushing back. With BTF we can support same
functionality in better form, which is what I want to prioritize and
which will be beneficial to the whole BPF ecosystem.

But I also want to make libbpf useful to iproute2 and other custom
loaders that have to support existing formats, and thus my proposal to
have libbpf provide granular enough APIs to augment default format in
non-intrusive way. Should this be callback-based or not is secondary,
though important to API design, concern.

> format being BTF-based instead. Which would mean that iproute2 I would
> end up with a flow like this:
>
> - When given an elf file, try to run it through the "standard loader"
>   (2). If this works, great, proceed to program attach.
>
> - If using (2) fails because it doesn't understand the map definition,
>   fall back to a compatibility loader that parses the legacy iproute2
>   map definition format and uses (1) to load that.
>
>
> Does the above make sense? :)

It does, yes. Also, with BTF enabled it should be easy to distinguish
between those two (e.g., was bpf_elf_map type used? if yes, then it's
a compatibility format) and not do extra work.

>
> -Toke
Andrii Nakryiko Feb. 3, 2020, 7:29 a.m. UTC | #16
On Wed, Aug 28, 2019 at 1:40 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Fri, Aug 23, 2019 at 4:29 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >
> > [ ... snip ...]
> >
> > > E.g., today's API is essentially three steps:
> > >
> > > 1. open and parse ELF: collect relos, programs, map definitions
> > > 2. load: create maps from collected defs, do program/global data/CO-RE
> > > relocs, load and verify BPF programs
> > > 3. attach programs one by one.
> > >
> > > Between step 1 and 2 user has flexibility to create more maps, set up
> > > map-in-map, etc. Between 2 and 3 you can fill in global data, fill in
> > > tail call maps, etc. That's already pretty flexible. But we can tune
> > > and break apart those steps even further, if necessary.
> >
> > Today, steps 1 and 2 can be collapsed into a single call to
> > bpf_prog_load_xattr(). As Jesper's mail explains, for XDP we don't
> > generally want to do all the fancy rewriting stuff, we just want a
> > simple way to load a program and get reusable pinning of maps.
>
> I agree. See my response to Jesper's message. Note also my view of
> bpf_prog_load_xattr() existence.
>
> > Preferably in a way that is compatible with the iproute2 loader.
> >

Hi Toke,

I was wondering what's the state of converting iproute2 to use libbpf?
Is this still something you (or someone else) interested to do?

Briefly re-reading the thread, I think libbpf already has almost
everything to be used by iproute2. You've added map pinning, so with
bpf_map__set_pin_path() iproute2 should be able to specify pinning
path, according to its own logic. The only thing missing that I can
see is ability to specify numa_node, which we should add both to
BTF-defined map definitions (trivial change), as well as probably
expose a method like bpf_map__set_numa_node(struct bpf_map *map, int
numa_node) for non-declarative and non-BTF legacy cases.

There was concern about supporting "extended" bpf_map_def format of
iproute2 (bpf_elf_map, actually) with extra fields. I think it's
actually easy to handle as is without any extra new APIs.
bpf_object__open() w/ .relaxed_maps = true option will process
compatible 5 fields of bpf_map_def (type, key/value sizes,
max_entries, and map_flags) and will set up corresponding struct
bpf_map entries (but won't create BPF maps in kernel yet). Then
iproute2 can iterate over "maps" ELF section on its own, and see which
maps need to get some more adjustments before load phase: map-in-map
set up, numa node, pinning, etc. All those adjustments can be done
(except for numa yet) through existing libbpf APIs, as far as I can
tell. Once that is taken care of, proceed to bpf_object__load() and
other standard steps. No callbacks, no extra cruft.

Is there anything else that can block iproute2 conversion to libbpf?

BTW, I have a draft patches for declarative (BTF-based) map-in-map set
up and initialization the way I described it at Plumbers last year. So
while I'm finalizing that, thought I'll resurrect iproute2 thread and
see if we can get iproute2 migration to libbpf started.
Toke Høiland-Jørgensen Feb. 3, 2020, 7:34 p.m. UTC | #17
Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Wed, Aug 28, 2019 at 1:40 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
>>
>> On Fri, Aug 23, 2019 at 4:29 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >
>> > [ ... snip ...]
>> >
>> > > E.g., today's API is essentially three steps:
>> > >
>> > > 1. open and parse ELF: collect relos, programs, map definitions
>> > > 2. load: create maps from collected defs, do program/global data/CO-RE
>> > > relocs, load and verify BPF programs
>> > > 3. attach programs one by one.
>> > >
>> > > Between step 1 and 2 user has flexibility to create more maps, set up
>> > > map-in-map, etc. Between 2 and 3 you can fill in global data, fill in
>> > > tail call maps, etc. That's already pretty flexible. But we can tune
>> > > and break apart those steps even further, if necessary.
>> >
>> > Today, steps 1 and 2 can be collapsed into a single call to
>> > bpf_prog_load_xattr(). As Jesper's mail explains, for XDP we don't
>> > generally want to do all the fancy rewriting stuff, we just want a
>> > simple way to load a program and get reusable pinning of maps.
>>
>> I agree. See my response to Jesper's message. Note also my view of
>> bpf_prog_load_xattr() existence.
>>
>> > Preferably in a way that is compatible with the iproute2 loader.
>> >
>
> Hi Toke,
>
> I was wondering what's the state of converting iproute2 to use libbpf?
> Is this still something you (or someone else) interested to do?

Yeah, it's still on my list; planning to circle back to it once I have
finished an RFC implementation for XDP multiprog loading based on the
new function-replacing in the kernel.

(Not that this should keep anyone else from giving the conversion a go
and beating me to it :)).

> Briefly re-reading the thread, I think libbpf already has almost
> everything to be used by iproute2. You've added map pinning, so with
> bpf_map__set_pin_path() iproute2 should be able to specify pinning
> path, according to its own logic. The only thing missing that I can
> see is ability to specify numa_node, which we should add both to
> BTF-defined map definitions (trivial change), as well as probably
> expose a method like bpf_map__set_numa_node(struct bpf_map *map, int
> numa_node) for non-declarative and non-BTF legacy cases.

Yes, adding this to libbpf would be good.

> There was concern about supporting "extended" bpf_map_def format of
> iproute2 (bpf_elf_map, actually) with extra fields. I think it's
> actually easy to handle as is without any extra new APIs.
> bpf_object__open() w/ .relaxed_maps = true option will process
> compatible 5 fields of bpf_map_def (type, key/value sizes,
> max_entries, and map_flags) and will set up corresponding struct
> bpf_map entries (but won't create BPF maps in kernel yet). Then
> iproute2 can iterate over "maps" ELF section on its own, and see which
> maps need to get some more adjustments before load phase: map-in-map
> set up, numa node, pinning, etc. All those adjustments can be done
> (except for numa yet) through existing libbpf APIs, as far as I can
> tell. Once that is taken care of, proceed to bpf_object__load() and
> other standard steps. No callbacks, no extra cruft.
>
> Is there anything else that can block iproute2 conversion to libbpf?

I haven't looked into the details since my last RFC conversion series,
but from what I recall from that, and what we've been changing in libbpf
since, I was basically planning to do what you explained. So while there
are some details to work out, I believe it's basically straight forward,
and I can't think of anything that should block it.

> BTW, I have a draft patches for declarative (BTF-based) map-in-map set
> up and initialization the way I described it at Plumbers last year. So
> while I'm finalizing that, thought I'll resurrect iproute2 thread and
> see if we can get iproute2 migration to libbpf started.

Great! FWIW, as long as we have the legacy compatibility code in
iproute2, I don't think it'll be a problem (or a blocker for the
conversion) if BTF-defined maps can't express all the same things as the
legacy maps. The missing bits will come automatically as libbpf is
updated. But great to hear that you're working on this :)

-Toke
Andrii Nakryiko Feb. 4, 2020, 12:56 a.m. UTC | #18
On Mon, Feb 3, 2020 at 11:34 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>
> > On Wed, Aug 28, 2019 at 1:40 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> >>
> >> On Fri, Aug 23, 2019 at 4:29 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >> >
> >> > [ ... snip ...]
> >> >
> >> > > E.g., today's API is essentially three steps:
> >> > >
> >> > > 1. open and parse ELF: collect relos, programs, map definitions
> >> > > 2. load: create maps from collected defs, do program/global data/CO-RE
> >> > > relocs, load and verify BPF programs
> >> > > 3. attach programs one by one.
> >> > >
> >> > > Between step 1 and 2 user has flexibility to create more maps, set up
> >> > > map-in-map, etc. Between 2 and 3 you can fill in global data, fill in
> >> > > tail call maps, etc. That's already pretty flexible. But we can tune
> >> > > and break apart those steps even further, if necessary.
> >> >
> >> > Today, steps 1 and 2 can be collapsed into a single call to
> >> > bpf_prog_load_xattr(). As Jesper's mail explains, for XDP we don't
> >> > generally want to do all the fancy rewriting stuff, we just want a
> >> > simple way to load a program and get reusable pinning of maps.
> >>
> >> I agree. See my response to Jesper's message. Note also my view of
> >> bpf_prog_load_xattr() existence.
> >>
> >> > Preferably in a way that is compatible with the iproute2 loader.
> >> >
> >
> > Hi Toke,
> >
> > I was wondering what's the state of converting iproute2 to use libbpf?
> > Is this still something you (or someone else) interested to do?
>
> Yeah, it's still on my list; planning to circle back to it once I have
> finished an RFC implementation for XDP multiprog loading based on the
> new function-replacing in the kernel.
>
> (Not that this should keep anyone else from giving the conversion a go
> and beating me to it :)).
>
> > Briefly re-reading the thread, I think libbpf already has almost
> > everything to be used by iproute2. You've added map pinning, so with
> > bpf_map__set_pin_path() iproute2 should be able to specify pinning
> > path, according to its own logic. The only thing missing that I can
> > see is ability to specify numa_node, which we should add both to
> > BTF-defined map definitions (trivial change), as well as probably
> > expose a method like bpf_map__set_numa_node(struct bpf_map *map, int
> > numa_node) for non-declarative and non-BTF legacy cases.
>
> Yes, adding this to libbpf would be good.
>
> > There was concern about supporting "extended" bpf_map_def format of
> > iproute2 (bpf_elf_map, actually) with extra fields. I think it's
> > actually easy to handle as is without any extra new APIs.
> > bpf_object__open() w/ .relaxed_maps = true option will process
> > compatible 5 fields of bpf_map_def (type, key/value sizes,
> > max_entries, and map_flags) and will set up corresponding struct
> > bpf_map entries (but won't create BPF maps in kernel yet). Then
> > iproute2 can iterate over "maps" ELF section on its own, and see which
> > maps need to get some more adjustments before load phase: map-in-map
> > set up, numa node, pinning, etc. All those adjustments can be done
> > (except for numa yet) through existing libbpf APIs, as far as I can
> > tell. Once that is taken care of, proceed to bpf_object__load() and
> > other standard steps. No callbacks, no extra cruft.
> >
> > Is there anything else that can block iproute2 conversion to libbpf?
>
> I haven't looked into the details since my last RFC conversion series,
> but from what I recall from that, and what we've been changing in libbpf
> since, I was basically planning to do what you explained. So while there
> are some details to work out, I believe it's basically straight forward,
> and I can't think of anything that should block it.
>

Great! Just to disambiguate and make sure we are in agreement, my hope
here is that iproute2 can completely delegate to libbpf all the ELF
parsing, map creation, program loading, etc (including all the new
stuff like global variables, etc). And only for legacy maps in
SEC("maps"), it would have to parse that *single* ELF section (again,
on its own) and see if there are any extra features of struct
bpf_elf_map requested (i.e., numa, map-in-map, pinning), and if yes,
it would use programmatic libbpf APIs to set this up. It might need to
do additional BPF_PROG_ARRAY set up after BPF programs are loaded
(because iproute2 has its custom naming-based convention). But
hopefully we'll encourage people to gradually migrate to BTF-defined
maps with declarative ways of doing all that.

> > BTW, I have a draft patches for declarative (BTF-based) map-in-map set
> > up and initialization the way I described it at Plumbers last year. So
> > while I'm finalizing that, thought I'll resurrect iproute2 thread and
> > see if we can get iproute2 migration to libbpf started.
>
> Great! FWIW, as long as we have the legacy compatibility code in
> iproute2, I don't think it'll be a problem (or a blocker for the
> conversion) if BTF-defined maps can't express all the same things as the
> legacy maps. The missing bits will come automatically as libbpf is
> updated. But great to hear that you're working on this :)
>
> -Toke
>
David Ahern Feb. 4, 2020, 1:46 a.m. UTC | #19
On 2/3/20 5:56 PM, Andrii Nakryiko wrote:
> Great! Just to disambiguate and make sure we are in agreement, my hope
> here is that iproute2 can completely delegate to libbpf all the ELF
>

iproute2 needs to compile and continue working as is when libbpf is not
available. e.g., add check in configure to define HAVE_LIBBPF and move
the existing code and move under else branch.
Andrii Nakryiko Feb. 4, 2020, 3:41 a.m. UTC | #20
On Mon, Feb 3, 2020 at 5:46 PM David Ahern <dsahern@gmail.com> wrote:
>
> On 2/3/20 5:56 PM, Andrii Nakryiko wrote:
> > Great! Just to disambiguate and make sure we are in agreement, my hope
> > here is that iproute2 can completely delegate to libbpf all the ELF
> >
>
> iproute2 needs to compile and continue working as is when libbpf is not
> available. e.g., add check in configure to define HAVE_LIBBPF and move
> the existing code and move under else branch.

Wouldn't it be better to statically compile against libbpf in this
case and get rid a lot of BPF-related code and simplify the rest of
it? This can be easily done by using libbpf through submodule, the
same way as BCC and pahole do it.
David Ahern Feb. 4, 2020, 4:52 a.m. UTC | #21
On 2/3/20 8:41 PM, Andrii Nakryiko wrote:
> On Mon, Feb 3, 2020 at 5:46 PM David Ahern <dsahern@gmail.com> wrote:
>>
>> On 2/3/20 5:56 PM, Andrii Nakryiko wrote:
>>> Great! Just to disambiguate and make sure we are in agreement, my hope
>>> here is that iproute2 can completely delegate to libbpf all the ELF
>>>
>>
>> iproute2 needs to compile and continue working as is when libbpf is not
>> available. e.g., add check in configure to define HAVE_LIBBPF and move
>> the existing code and move under else branch.
> 
> Wouldn't it be better to statically compile against libbpf in this
> case and get rid a lot of BPF-related code and simplify the rest of
> it? This can be easily done by using libbpf through submodule, the
> same way as BCC and pahole do it.
> 

iproute2 compiles today and runs on older distributions and older
distributions with newer kernels. That needs to hold true after the move
to libbpf.
Andrii Nakryiko Feb. 4, 2020, 5 a.m. UTC | #22
On Mon, Feb 3, 2020 at 8:53 PM David Ahern <dsahern@gmail.com> wrote:
>
> On 2/3/20 8:41 PM, Andrii Nakryiko wrote:
> > On Mon, Feb 3, 2020 at 5:46 PM David Ahern <dsahern@gmail.com> wrote:
> >>
> >> On 2/3/20 5:56 PM, Andrii Nakryiko wrote:
> >>> Great! Just to disambiguate and make sure we are in agreement, my hope
> >>> here is that iproute2 can completely delegate to libbpf all the ELF
> >>>
> >>
> >> iproute2 needs to compile and continue working as is when libbpf is not
> >> available. e.g., add check in configure to define HAVE_LIBBPF and move
> >> the existing code and move under else branch.
> >
> > Wouldn't it be better to statically compile against libbpf in this
> > case and get rid a lot of BPF-related code and simplify the rest of
> > it? This can be easily done by using libbpf through submodule, the
> > same way as BCC and pahole do it.
> >
>
> iproute2 compiles today and runs on older distributions and older
> distributions with newer kernels. That needs to hold true after the move
> to libbpf.

And by statically compiling against libbpf, checked out as a
submodule, that will still hold true, wouldn't it? Or there is some
complications I'm missing? Libbpf is designed to handle old kernels
with no problems.
Toke Høiland-Jørgensen Feb. 4, 2020, 8:25 a.m. UTC | #23
Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Mon, Feb 3, 2020 at 8:53 PM David Ahern <dsahern@gmail.com> wrote:
>>
>> On 2/3/20 8:41 PM, Andrii Nakryiko wrote:
>> > On Mon, Feb 3, 2020 at 5:46 PM David Ahern <dsahern@gmail.com> wrote:
>> >>
>> >> On 2/3/20 5:56 PM, Andrii Nakryiko wrote:
>> >>> Great! Just to disambiguate and make sure we are in agreement, my hope
>> >>> here is that iproute2 can completely delegate to libbpf all the ELF
>> >>>
>> >>
>> >> iproute2 needs to compile and continue working as is when libbpf is not
>> >> available. e.g., add check in configure to define HAVE_LIBBPF and move
>> >> the existing code and move under else branch.
>> >
>> > Wouldn't it be better to statically compile against libbpf in this
>> > case and get rid a lot of BPF-related code and simplify the rest of
>> > it? This can be easily done by using libbpf through submodule, the
>> > same way as BCC and pahole do it.
>> >
>>
>> iproute2 compiles today and runs on older distributions and older
>> distributions with newer kernels. That needs to hold true after the move
>> to libbpf.
>
> And by statically compiling against libbpf, checked out as a
> submodule, that will still hold true, wouldn't it? Or there is some
> complications I'm missing? Libbpf is designed to handle old kernels
> with no problems.

My plan was to use the same configure test I'm using for xdp-tools
(where I in turn copied the structure of the configure script from
iproute2):

https://github.com/xdp-project/xdp-tools/blob/master/configure#L59

This will look for a system libbpf install and compile against it if it
is compatible, and otherwise fall back to a statically linking against a
git submodule.

We'll need to double-check that this will work on everything currently
supported by iproute2, and fix libbpf if there are any issues with that.
Not that I foresee any, but you never know :)

-Toke
Toke Høiland-Jørgensen Feb. 4, 2020, 8:27 a.m. UTC | #24
Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Mon, Feb 3, 2020 at 11:34 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>>
>> > On Wed, Aug 28, 2019 at 1:40 PM Andrii Nakryiko
>> > <andrii.nakryiko@gmail.com> wrote:
>> >>
>> >> On Fri, Aug 23, 2019 at 4:29 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >> >
>> >> > [ ... snip ...]
>> >> >
>> >> > > E.g., today's API is essentially three steps:
>> >> > >
>> >> > > 1. open and parse ELF: collect relos, programs, map definitions
>> >> > > 2. load: create maps from collected defs, do program/global data/CO-RE
>> >> > > relocs, load and verify BPF programs
>> >> > > 3. attach programs one by one.
>> >> > >
>> >> > > Between step 1 and 2 user has flexibility to create more maps, set up
>> >> > > map-in-map, etc. Between 2 and 3 you can fill in global data, fill in
>> >> > > tail call maps, etc. That's already pretty flexible. But we can tune
>> >> > > and break apart those steps even further, if necessary.
>> >> >
>> >> > Today, steps 1 and 2 can be collapsed into a single call to
>> >> > bpf_prog_load_xattr(). As Jesper's mail explains, for XDP we don't
>> >> > generally want to do all the fancy rewriting stuff, we just want a
>> >> > simple way to load a program and get reusable pinning of maps.
>> >>
>> >> I agree. See my response to Jesper's message. Note also my view of
>> >> bpf_prog_load_xattr() existence.
>> >>
>> >> > Preferably in a way that is compatible with the iproute2 loader.
>> >> >
>> >
>> > Hi Toke,
>> >
>> > I was wondering what's the state of converting iproute2 to use libbpf?
>> > Is this still something you (or someone else) interested to do?
>>
>> Yeah, it's still on my list; planning to circle back to it once I have
>> finished an RFC implementation for XDP multiprog loading based on the
>> new function-replacing in the kernel.
>>
>> (Not that this should keep anyone else from giving the conversion a go
>> and beating me to it :)).
>>
>> > Briefly re-reading the thread, I think libbpf already has almost
>> > everything to be used by iproute2. You've added map pinning, so with
>> > bpf_map__set_pin_path() iproute2 should be able to specify pinning
>> > path, according to its own logic. The only thing missing that I can
>> > see is ability to specify numa_node, which we should add both to
>> > BTF-defined map definitions (trivial change), as well as probably
>> > expose a method like bpf_map__set_numa_node(struct bpf_map *map, int
>> > numa_node) for non-declarative and non-BTF legacy cases.
>>
>> Yes, adding this to libbpf would be good.
>>
>> > There was concern about supporting "extended" bpf_map_def format of
>> > iproute2 (bpf_elf_map, actually) with extra fields. I think it's
>> > actually easy to handle as is without any extra new APIs.
>> > bpf_object__open() w/ .relaxed_maps = true option will process
>> > compatible 5 fields of bpf_map_def (type, key/value sizes,
>> > max_entries, and map_flags) and will set up corresponding struct
>> > bpf_map entries (but won't create BPF maps in kernel yet). Then
>> > iproute2 can iterate over "maps" ELF section on its own, and see which
>> > maps need to get some more adjustments before load phase: map-in-map
>> > set up, numa node, pinning, etc. All those adjustments can be done
>> > (except for numa yet) through existing libbpf APIs, as far as I can
>> > tell. Once that is taken care of, proceed to bpf_object__load() and
>> > other standard steps. No callbacks, no extra cruft.
>> >
>> > Is there anything else that can block iproute2 conversion to libbpf?
>>
>> I haven't looked into the details since my last RFC conversion series,
>> but from what I recall from that, and what we've been changing in libbpf
>> since, I was basically planning to do what you explained. So while there
>> are some details to work out, I believe it's basically straight forward,
>> and I can't think of anything that should block it.
>>
>
> Great! Just to disambiguate and make sure we are in agreement, my hope
> here is that iproute2 can completely delegate to libbpf all the ELF
> parsing, map creation, program loading, etc (including all the new
> stuff like global variables, etc). And only for legacy maps in
> SEC("maps"), it would have to parse that *single* ELF section (again,
> on its own) and see if there are any extra features of struct
> bpf_elf_map requested (i.e., numa, map-in-map, pinning), and if yes,
> it would use programmatic libbpf APIs to set this up. It might need to
> do additional BPF_PROG_ARRAY set up after BPF programs are loaded
> (because iproute2 has its custom naming-based convention). But
> hopefully we'll encourage people to gradually migrate to BTF-defined
> maps with declarative ways of doing all that.

Yup, that is my hope as well. Let's see how it goes :)

-Toke
Andrii Nakryiko Feb. 4, 2020, 6:47 p.m. UTC | #25
On Tue, Feb 4, 2020 at 12:25 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>
> > On Mon, Feb 3, 2020 at 8:53 PM David Ahern <dsahern@gmail.com> wrote:
> >>
> >> On 2/3/20 8:41 PM, Andrii Nakryiko wrote:
> >> > On Mon, Feb 3, 2020 at 5:46 PM David Ahern <dsahern@gmail.com> wrote:
> >> >>
> >> >> On 2/3/20 5:56 PM, Andrii Nakryiko wrote:
> >> >>> Great! Just to disambiguate and make sure we are in agreement, my hope
> >> >>> here is that iproute2 can completely delegate to libbpf all the ELF
> >> >>>
> >> >>
> >> >> iproute2 needs to compile and continue working as is when libbpf is not
> >> >> available. e.g., add check in configure to define HAVE_LIBBPF and move
> >> >> the existing code and move under else branch.
> >> >
> >> > Wouldn't it be better to statically compile against libbpf in this
> >> > case and get rid a lot of BPF-related code and simplify the rest of
> >> > it? This can be easily done by using libbpf through submodule, the
> >> > same way as BCC and pahole do it.
> >> >
> >>
> >> iproute2 compiles today and runs on older distributions and older
> >> distributions with newer kernels. That needs to hold true after the move
> >> to libbpf.
> >
> > And by statically compiling against libbpf, checked out as a
> > submodule, that will still hold true, wouldn't it? Or there is some
> > complications I'm missing? Libbpf is designed to handle old kernels
> > with no problems.
>
> My plan was to use the same configure test I'm using for xdp-tools
> (where I in turn copied the structure of the configure script from
> iproute2):
>
> https://github.com/xdp-project/xdp-tools/blob/master/configure#L59
>
> This will look for a system libbpf install and compile against it if it
> is compatible, and otherwise fall back to a statically linking against a
> git submodule.

How will this work when build host has libbpf installed, but target
host doesn't? You'll get dynamic linker error when trying to run that
tool.

If the goal is to have a reliable tool working everywhere, and you
already support having libbpf as a submodule, why not always use
submodule's libbpf? What's the concern? Libbpf is a small library, I
don't think a binary size argument is enough reason to not do this. On
the other hand, by using libbpf from submodule, your tool is built
*and tested* with a well-known libbpf version that tool-producer
controls.

>
> We'll need to double-check that this will work on everything currently
> supported by iproute2, and fix libbpf if there are any issues with that.
> Not that I foresee any, but you never know :)
>
> -Toke
>
Toke Høiland-Jørgensen Feb. 4, 2020, 7:19 p.m. UTC | #26
Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Tue, Feb 4, 2020 at 12:25 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>>
>> > On Mon, Feb 3, 2020 at 8:53 PM David Ahern <dsahern@gmail.com> wrote:
>> >>
>> >> On 2/3/20 8:41 PM, Andrii Nakryiko wrote:
>> >> > On Mon, Feb 3, 2020 at 5:46 PM David Ahern <dsahern@gmail.com> wrote:
>> >> >>
>> >> >> On 2/3/20 5:56 PM, Andrii Nakryiko wrote:
>> >> >>> Great! Just to disambiguate and make sure we are in agreement, my hope
>> >> >>> here is that iproute2 can completely delegate to libbpf all the ELF
>> >> >>>
>> >> >>
>> >> >> iproute2 needs to compile and continue working as is when libbpf is not
>> >> >> available. e.g., add check in configure to define HAVE_LIBBPF and move
>> >> >> the existing code and move under else branch.
>> >> >
>> >> > Wouldn't it be better to statically compile against libbpf in this
>> >> > case and get rid a lot of BPF-related code and simplify the rest of
>> >> > it? This can be easily done by using libbpf through submodule, the
>> >> > same way as BCC and pahole do it.
>> >> >
>> >>
>> >> iproute2 compiles today and runs on older distributions and older
>> >> distributions with newer kernels. That needs to hold true after the move
>> >> to libbpf.
>> >
>> > And by statically compiling against libbpf, checked out as a
>> > submodule, that will still hold true, wouldn't it? Or there is some
>> > complications I'm missing? Libbpf is designed to handle old kernels
>> > with no problems.
>>
>> My plan was to use the same configure test I'm using for xdp-tools
>> (where I in turn copied the structure of the configure script from
>> iproute2):
>>
>> https://github.com/xdp-project/xdp-tools/blob/master/configure#L59
>>
>> This will look for a system libbpf install and compile against it if it
>> is compatible, and otherwise fall back to a statically linking against a
>> git submodule.
>
> How will this work when build host has libbpf installed, but target
> host doesn't? You'll get dynamic linker error when trying to run that
> tool.

That's called dependency tracking; distros have various ways of going
about that :)

But yeah, if you're going to do you own cross-compilation, you'd
probably want to just force using the static library.

> If the goal is to have a reliable tool working everywhere, and you
> already support having libbpf as a submodule, why not always use
> submodule's libbpf? What's the concern? Libbpf is a small library, I
> don't think a binary size argument is enough reason to not do this. On
> the other hand, by using libbpf from submodule, your tool is built
> *and tested* with a well-known libbpf version that tool-producer
> controls.

I thought we already had this discussion? :)

libbpf is a library like any other. Distros that package the library
want the tools that use it to be dynamically linked against it so
library upgrades (especially of the CVE-fixing kind) get picked up by
all users. Other distros have memory and space constraints (iproute2 is
shipped on OpenWrt, for instance, which is *extremely*
space-constrained). And yeah, other deployments don't care and will just
statically compile in the vendored version. So we'll need to support all
of those use cases.

-Toke
Andrii Nakryiko Feb. 4, 2020, 7:29 p.m. UTC | #27
On Tue, Feb 4, 2020 at 11:19 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>
> > On Tue, Feb 4, 2020 at 12:25 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> >>
> >> > On Mon, Feb 3, 2020 at 8:53 PM David Ahern <dsahern@gmail.com> wrote:
> >> >>
> >> >> On 2/3/20 8:41 PM, Andrii Nakryiko wrote:
> >> >> > On Mon, Feb 3, 2020 at 5:46 PM David Ahern <dsahern@gmail.com> wrote:
> >> >> >>
> >> >> >> On 2/3/20 5:56 PM, Andrii Nakryiko wrote:
> >> >> >>> Great! Just to disambiguate and make sure we are in agreement, my hope
> >> >> >>> here is that iproute2 can completely delegate to libbpf all the ELF
> >> >> >>>
> >> >> >>
> >> >> >> iproute2 needs to compile and continue working as is when libbpf is not
> >> >> >> available. e.g., add check in configure to define HAVE_LIBBPF and move
> >> >> >> the existing code and move under else branch.
> >> >> >
> >> >> > Wouldn't it be better to statically compile against libbpf in this
> >> >> > case and get rid a lot of BPF-related code and simplify the rest of
> >> >> > it? This can be easily done by using libbpf through submodule, the
> >> >> > same way as BCC and pahole do it.
> >> >> >
> >> >>
> >> >> iproute2 compiles today and runs on older distributions and older
> >> >> distributions with newer kernels. That needs to hold true after the move
> >> >> to libbpf.
> >> >
> >> > And by statically compiling against libbpf, checked out as a
> >> > submodule, that will still hold true, wouldn't it? Or there is some
> >> > complications I'm missing? Libbpf is designed to handle old kernels
> >> > with no problems.
> >>
> >> My plan was to use the same configure test I'm using for xdp-tools
> >> (where I in turn copied the structure of the configure script from
> >> iproute2):
> >>
> >> https://github.com/xdp-project/xdp-tools/blob/master/configure#L59
> >>
> >> This will look for a system libbpf install and compile against it if it
> >> is compatible, and otherwise fall back to a statically linking against a
> >> git submodule.
> >
> > How will this work when build host has libbpf installed, but target
> > host doesn't? You'll get dynamic linker error when trying to run that
> > tool.
>
> That's called dependency tracking; distros have various ways of going
> about that :)

I'm confused, honestly. libbpf is either a dependency and thus can be
relied upon to be present in the target system, or it's not and this
whole dance with detecting libbpf presence needs to be performed.

If libbpf is optional, then I don't see how iproute2 BPF-related code
and complexity can be reduced at all, given it should still support
loading BPF programs even without libbpf. Furthermore, given libbpf
supports more features already and will probably be outpacing
iproute2's own BPF support in the future, some users will start
relying on BPF features supported only by libbpf "backend", so
iproute2's own BPF backend will just fail to load such programs,
bringing unpleasant surprises, potentially. So I still fail to see how
libbpf can be optional and what benefit does that bring. But shared or
static - whatever fits iproute2 best, no preferences.

>
> But yeah, if you're going to do you own cross-compilation, you'd
> probably want to just force using the static library.
>
> > If the goal is to have a reliable tool working everywhere, and you
> > already support having libbpf as a submodule, why not always use
> > submodule's libbpf? What's the concern? Libbpf is a small library, I
> > don't think a binary size argument is enough reason to not do this. On
> > the other hand, by using libbpf from submodule, your tool is built
> > *and tested* with a well-known libbpf version that tool-producer
> > controls.
>
> I thought we already had this discussion? :)

Yeah, I guess we did for bpftool :) No worries, I don't care that
much, just see my notes above about optional libbpf and consequences.

>
> libbpf is a library like any other. Distros that package the library
> want the tools that use it to be dynamically linked against it so
> library upgrades (especially of the CVE-fixing kind) get picked up by
> all users. Other distros have memory and space constraints (iproute2 is
> shipped on OpenWrt, for instance, which is *extremely*
> space-constrained). And yeah, other deployments don't care and will just
> statically compile in the vendored version. So we'll need to support all
> of those use cases.
>
> -Toke
>
Toke Høiland-Jørgensen Feb. 4, 2020, 9:56 p.m. UTC | #28
Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Tue, Feb 4, 2020 at 11:19 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>>
>> > On Tue, Feb 4, 2020 at 12:25 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >>
>> >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>> >>
>> >> > On Mon, Feb 3, 2020 at 8:53 PM David Ahern <dsahern@gmail.com> wrote:
>> >> >>
>> >> >> On 2/3/20 8:41 PM, Andrii Nakryiko wrote:
>> >> >> > On Mon, Feb 3, 2020 at 5:46 PM David Ahern <dsahern@gmail.com> wrote:
>> >> >> >>
>> >> >> >> On 2/3/20 5:56 PM, Andrii Nakryiko wrote:
>> >> >> >>> Great! Just to disambiguate and make sure we are in agreement, my hope
>> >> >> >>> here is that iproute2 can completely delegate to libbpf all the ELF
>> >> >> >>>
>> >> >> >>
>> >> >> >> iproute2 needs to compile and continue working as is when libbpf is not
>> >> >> >> available. e.g., add check in configure to define HAVE_LIBBPF and move
>> >> >> >> the existing code and move under else branch.
>> >> >> >
>> >> >> > Wouldn't it be better to statically compile against libbpf in this
>> >> >> > case and get rid a lot of BPF-related code and simplify the rest of
>> >> >> > it? This can be easily done by using libbpf through submodule, the
>> >> >> > same way as BCC and pahole do it.
>> >> >> >
>> >> >>
>> >> >> iproute2 compiles today and runs on older distributions and older
>> >> >> distributions with newer kernels. That needs to hold true after the move
>> >> >> to libbpf.
>> >> >
>> >> > And by statically compiling against libbpf, checked out as a
>> >> > submodule, that will still hold true, wouldn't it? Or there is some
>> >> > complications I'm missing? Libbpf is designed to handle old kernels
>> >> > with no problems.
>> >>
>> >> My plan was to use the same configure test I'm using for xdp-tools
>> >> (where I in turn copied the structure of the configure script from
>> >> iproute2):
>> >>
>> >> https://github.com/xdp-project/xdp-tools/blob/master/configure#L59
>> >>
>> >> This will look for a system libbpf install and compile against it if it
>> >> is compatible, and otherwise fall back to a statically linking against a
>> >> git submodule.
>> >
>> > How will this work when build host has libbpf installed, but target
>> > host doesn't? You'll get dynamic linker error when trying to run that
>> > tool.
>>
>> That's called dependency tracking; distros have various ways of going
>> about that :)
>
> I'm confused, honestly. libbpf is either a dependency and thus can be
> relied upon to be present in the target system, or it's not and this
> whole dance with detecting libbpf presence needs to be performed.

Yes, and iproute2 is likely to be built in both sorts of environments,
so we will have to support both :)

> If libbpf is optional, then I don't see how iproute2 BPF-related code
> and complexity can be reduced at all, given it should still support
> loading BPF programs even without libbpf. Furthermore, given libbpf
> supports more features already and will probably be outpacing
> iproute2's own BPF support in the future, some users will start
> relying on BPF features supported only by libbpf "backend", so
> iproute2's own BPF backend will just fail to load such programs,
> bringing unpleasant surprises, potentially. So I still fail to see how
> libbpf can be optional and what benefit does that bring.

I wasn't saying that libbpf itself should be optional; if we're porting
things, we should rip out as much of the old code as we can. I just
meant that we should support both modes of building, so distros that
*do* build libbpf as a library can link iproute2 against that with as
little friction as possible.

I'm dead set on a specific auto-detection semantic either; I guess it'll
be up to the iproute2 maintainers whether they prefer defaulting to one
or the other.

> But shared or static - whatever fits iproute2 best, no preferences.

Right, cool, I think we are basically agreed, given the above :)

-Toke
David Ahern Feb. 4, 2020, 10:12 p.m. UTC | #29
On 2/4/20 2:56 PM, Toke Høiland-Jørgensen wrote:
>> I'm confused, honestly. libbpf is either a dependency and thus can be
>> relied upon to be present in the target system, or it's not and this
>> whole dance with detecting libbpf presence needs to be performed.
> 
> Yes, and iproute2 is likely to be built in both sorts of environments,
> so we will have to support both :)
> 
>> If libbpf is optional, then I don't see how iproute2 BPF-related code
>> and complexity can be reduced at all, given it should still support
>> loading BPF programs even without libbpf. Furthermore, given libbpf
>> supports more features already and will probably be outpacing
>> iproute2's own BPF support in the future, some users will start
>> relying on BPF features supported only by libbpf "backend", so
>> iproute2's own BPF backend will just fail to load such programs,
>> bringing unpleasant surprises, potentially. So I still fail to see how
>> libbpf can be optional and what benefit does that bring.
> 
> I wasn't saying that libbpf itself should be optional; if we're porting
> things, we should rip out as much of the old code as we can. I just
> meant that we should support both modes of building, so distros that
> *do* build libbpf as a library can link iproute2 against that with as
> little friction as possible.
> 
> I'm dead set on a specific auto-detection semantic either; I guess it'll
> be up to the iproute2 maintainers whether they prefer defaulting to one
> or the other.
> 

A few concerns from my perspective:

1. Right now ip comes in around 650k unstripped; libbpf.a for 0.0.7 is
around 1.2M with the size of libbpf.o > than ip. Most likely, making
iproute2 use libbpf statically is going to be challenging and I am not
sure it is the right thing to do (unless the user is building a static
version of iproute2 commands).

2. git submodules can be a PITA to deal with (e.g., jumping between
branches and versions), so there needs to be a good reason for it.

3. iproute2 code needs to build for a wide range of OSes and not lose
functionality compared to what it has today.
Toke Høiland-Jørgensen Feb. 4, 2020, 10:35 p.m. UTC | #30
David Ahern <dsahern@gmail.com> writes:

> On 2/4/20 2:56 PM, Toke Høiland-Jørgensen wrote:
>>> I'm confused, honestly. libbpf is either a dependency and thus can be
>>> relied upon to be present in the target system, or it's not and this
>>> whole dance with detecting libbpf presence needs to be performed.
>> 
>> Yes, and iproute2 is likely to be built in both sorts of environments,
>> so we will have to support both :)
>> 
>>> If libbpf is optional, then I don't see how iproute2 BPF-related code
>>> and complexity can be reduced at all, given it should still support
>>> loading BPF programs even without libbpf. Furthermore, given libbpf
>>> supports more features already and will probably be outpacing
>>> iproute2's own BPF support in the future, some users will start
>>> relying on BPF features supported only by libbpf "backend", so
>>> iproute2's own BPF backend will just fail to load such programs,
>>> bringing unpleasant surprises, potentially. So I still fail to see how
>>> libbpf can be optional and what benefit does that bring.
>> 
>> I wasn't saying that libbpf itself should be optional; if we're porting
>> things, we should rip out as much of the old code as we can. I just
>> meant that we should support both modes of building, so distros that
>> *do* build libbpf as a library can link iproute2 against that with as
>> little friction as possible.
>> 
>> I'm dead set on a specific auto-detection semantic either; I guess it'll
>> be up to the iproute2 maintainers whether they prefer defaulting to one
>> or the other.
>> 
>
> A few concerns from my perspective:
>
> 1. Right now ip comes in around 650k unstripped; libbpf.a for 0.0.7 is
> around 1.2M with the size of libbpf.o > than ip.

Hmm, I'm getting ~700k for libbpf.a and libbpf.so.0.0.7 is ~480k (for
whichever kernel I currently have checked out). But lib/bpf.o in
iproute2 is only 80k, so fair point :)

> Most likely, making iproute2 use libbpf statically is going to be
> challenging and I am not sure it is the right thing to do (unless the
> user is building a static version of iproute2 commands).

Linking dynamically would imply a new dependency. I'm not necessarily
against that, but would it be acceptable from your PoV? And if so,
should we keep the current internal BPF code for when libbpf is not
available, or would it be acceptable to not be able to load BPF programs
if libbpf is not present (similar to how the libelf dependency works
today)?

> 2. git submodules can be a PITA to deal with (e.g., jumping between
> branches and versions), so there needs to be a good reason for it.

Yes, totally with you on that. Another option could be to just copy the
files into the iproute2 tree, and update them the same way the kernel
headers are? Or maybe doing fancy things like this:
https://github.com/apenwarr/git-subtrac

> 3. iproute2 code needs to build for a wide range of OSes and not lose
> functionality compared to what it has today.

Could you be a bit more specific about "a wide range of OSes"? I guess
we could do the work to make sure libbpf builds on all the same
platforms iproute2 supports, but we'd need something a bit more definite
to go on...

-Toke
David Ahern Feb. 4, 2020, 11:13 p.m. UTC | #31
On 2/4/20 3:35 PM, Toke Høiland-Jørgensen wrote:
> 
>> Most likely, making iproute2 use libbpf statically is going to be
>> challenging and I am not sure it is the right thing to do (unless the
>> user is building a static version of iproute2 commands).
> 
> Linking dynamically would imply a new dependency. I'm not necessarily
> against that, but would it be acceptable from your PoV? And if so,
> should we keep the current internal BPF code for when libbpf is not
> available, or would it be acceptable to not be able to load BPF programs
> if libbpf is not present (similar to how the libelf dependency works
> today)?

iproute2 recently gained the libmnl dependency for extack. Seems like
libbpf falls into the similar category.

> 
>> 2. git submodules can be a PITA to deal with (e.g., jumping between
>> branches and versions), so there needs to be a good reason for it.
> 
> Yes, totally with you on that. Another option could be to just copy the
> files into the iproute2 tree, and update them the same way the kernel
> headers are? Or maybe doing fancy things like this:
> https://github.com/apenwarr/git-subtrac

kernel uapi is a totally different reason to import the headers. bpf
functionality is an add-on.

I would like to see iproute2 work with libbpf. Given libbpf's current
status and availability across OS'es that is going to be a challenge for
a lot of OS'es which is why I suggested the HAVE_LIBBPF check falls back
to existing code if libbpf is not installed.

> 
>> 3. iproute2 code needs to build for a wide range of OSes and not lose
>> functionality compared to what it has today.
> 
> Could you be a bit more specific about "a wide range of OSes"? I guess
> we could do the work to make sure libbpf builds on all the same
> platforms iproute2 supports, but we'd need something a bit more definite
> to go on...
> 

rhel5/centos5? definitely rhel6/centos6 time frame and forward.

Stephen: has the backwards lifetime ever been stated?

Changing configure to check for existence and fall back to existing code
seems to me the safest option.
Toke Høiland-Jørgensen Feb. 5, 2020, 10:37 a.m. UTC | #32
David Ahern <dsahern@gmail.com> writes:

> On 2/4/20 3:35 PM, Toke Høiland-Jørgensen wrote:
>> 
>>> Most likely, making iproute2 use libbpf statically is going to be
>>> challenging and I am not sure it is the right thing to do (unless the
>>> user is building a static version of iproute2 commands).
>> 
>> Linking dynamically would imply a new dependency. I'm not necessarily
>> against that, but would it be acceptable from your PoV? And if so,
>> should we keep the current internal BPF code for when libbpf is not
>> available, or would it be acceptable to not be able to load BPF programs
>> if libbpf is not present (similar to how the libelf dependency works
>> today)?
>
> iproute2 recently gained the libmnl dependency for extack. Seems like
> libbpf falls into the similar category.
>
>> 
>>> 2. git submodules can be a PITA to deal with (e.g., jumping between
>>> branches and versions), so there needs to be a good reason for it.
>> 
>> Yes, totally with you on that. Another option could be to just copy the
>> files into the iproute2 tree, and update them the same way the kernel
>> headers are? Or maybe doing fancy things like this:
>> https://github.com/apenwarr/git-subtrac
>
> kernel uapi is a totally different reason to import the headers. bpf
> functionality is an add-on.
>
> I would like to see iproute2 work with libbpf. Given libbpf's current
> status and availability across OS'es that is going to be a challenge for
> a lot of OS'es which is why I suggested the HAVE_LIBBPF check falls back
> to existing code if libbpf is not installed.

Sure, can do.

-Toke