diff mbox series

[bpf-next] bpf: libbpf: retry program creation without the name

Message ID 20181120004625.243494-1-sdf@google.com
State Changes Requested, archived
Delegated to: BPF Maintainers
Headers show
Series [bpf-next] bpf: libbpf: retry program creation without the name | expand

Commit Message

Stanislav Fomichev Nov. 20, 2018, 12:46 a.m. UTC
[Recent commit 23499442c319 ("bpf: libbpf: retry map creation without
the name") fixed this issue for maps, let's do the same for programs.]

Since commit 88cda1c9da02 ("bpf: libbpf: Provide basic API support
to specify BPF obj name"), libbpf unconditionally sets bpf_attr->name
for programs. Pre v4.14 kernels don't know about programs names and
return an error about unexpected non-zero data. Retry sys_bpf without
a program name to cover older kernels.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 tools/lib/bpf/bpf.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Y Song Nov. 20, 2018, 1:24 a.m. UTC | #1
On Mon, Nov 19, 2018 at 4:52 PM Stanislav Fomichev <sdf@google.com> wrote:
>
> [Recent commit 23499442c319 ("bpf: libbpf: retry map creation without
> the name") fixed this issue for maps, let's do the same for programs.]
>
> Since commit 88cda1c9da02 ("bpf: libbpf: Provide basic API support
> to specify BPF obj name"), libbpf unconditionally sets bpf_attr->name
> for programs. Pre v4.14 kernels don't know about programs names and
> return an error about unexpected non-zero data. Retry sys_bpf without
> a program name to cover older kernels.
>
> Signed-off-by: Stanislav Fomichev <sdf@google.com>

Acked-by: Yonghong Song <yhs@fb.com>
Alexei Starovoitov Nov. 20, 2018, 7:51 p.m. UTC | #2
On Mon, Nov 19, 2018 at 04:46:25PM -0800, Stanislav Fomichev wrote:
> [Recent commit 23499442c319 ("bpf: libbpf: retry map creation without
> the name") fixed this issue for maps, let's do the same for programs.]
> 
> Since commit 88cda1c9da02 ("bpf: libbpf: Provide basic API support
> to specify BPF obj name"), libbpf unconditionally sets bpf_attr->name
> for programs. Pre v4.14 kernels don't know about programs names and
> return an error about unexpected non-zero data. Retry sys_bpf without
> a program name to cover older kernels.
> 
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>  tools/lib/bpf/bpf.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> index 961e1b9fc592..cbe9d757c646 100644
> --- a/tools/lib/bpf/bpf.c
> +++ b/tools/lib/bpf/bpf.c
> @@ -212,6 +212,16 @@ int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
>  	if (fd >= 0 || !log_buf || !log_buf_sz)
>  		return fd;
>  
> +	if (fd < 0 && errno == E2BIG && load_attr->name) {
> +		/* Retry the same syscall, but without the name.
> +		 * Pre v4.14 kernels don't support prog names.
> +		 */

I'm afraid that will put unnecessary stress on the kernel.
This check needs to be tighter.
Like E2BIG and anything in the log_buf probably means that
E2BIG came from the verifier and nothing to do with prog_name.
Asking kernel to repeat is an unnecessary work.

In general we need to think beyond this single prog_name field.
There are bunch of other fields in bpf_load_program_xattr() and older kernels
won't support them. Are we going to zero them out one by one
and retry? I don't think that would be practical.

Also libbpf silently ignoring prog_name is not great for debugging.
A warning is needed.
But it cannot be done out of lib/bpf/bpf.c, since it's a set of syscall
wrappers.
Imo such "old kernel -> lets retry" feature should probably be done
at lib/bpf/libbpf.c level. inside load_program().
Stanislav Fomichev Nov. 20, 2018, 9:19 p.m. UTC | #3
On 11/20, Alexei Starovoitov wrote:
> On Mon, Nov 19, 2018 at 04:46:25PM -0800, Stanislav Fomichev wrote:
> > [Recent commit 23499442c319 ("bpf: libbpf: retry map creation without
> > the name") fixed this issue for maps, let's do the same for programs.]
> > 
> > Since commit 88cda1c9da02 ("bpf: libbpf: Provide basic API support
> > to specify BPF obj name"), libbpf unconditionally sets bpf_attr->name
> > for programs. Pre v4.14 kernels don't know about programs names and
> > return an error about unexpected non-zero data. Retry sys_bpf without
> > a program name to cover older kernels.
> > 
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> >  tools/lib/bpf/bpf.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> > index 961e1b9fc592..cbe9d757c646 100644
> > --- a/tools/lib/bpf/bpf.c
> > +++ b/tools/lib/bpf/bpf.c
> > @@ -212,6 +212,16 @@ int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
> >  	if (fd >= 0 || !log_buf || !log_buf_sz)
> >  		return fd;
> >  
> > +	if (fd < 0 && errno == E2BIG && load_attr->name) {
> > +		/* Retry the same syscall, but without the name.
> > +		 * Pre v4.14 kernels don't support prog names.
> > +		 */
> 
> I'm afraid that will put unnecessary stress on the kernel.
> This check needs to be tighter.
> Like E2BIG and anything in the log_buf probably means that
> E2BIG came from the verifier and nothing to do with prog_name.
> Asking kernel to repeat is an unnecessary work.
> 
> In general we need to think beyond this single prog_name field.
> There are bunch of other fields in bpf_load_program_xattr() and older kernels
> won't support them. Are we going to zero them out one by one
> and retry? I don't think that would be practical.
I general, we don't want to zero anything out. However,
for this particular problem the rationale is the following:
In commit 88cda1c9da02 we started unconditionally setting {prog,map}->name
from the 'higher' libbpfc layer which breaks users on the older kernels.

> Also libbpf silently ignoring prog_name is not great for debugging.
> A warning is needed.
> But it cannot be done out of lib/bpf/bpf.c, since it's a set of syscall
> wrappers.
> Imo such "old kernel -> lets retry" feature should probably be done
> at lib/bpf/libbpf.c level. inside load_program().
For maps bpftools calls bpf_create_map_xattr directly, that's why
for maps I did the retry on the lower level (and why for programs I initially
thought about doing the same). However, in this case maybe asking
user to omit 'name' argument might be a better option.

For program names, I agree, we might think about doing it on the higher
level (although I'm not sure whether we want to have different API
expectations, i.e. bpf_create_map_xattr ignoring the name and
bpf_load_program_xattr not ignoring the name).

So given that rationale above, what do you think is the best way to
move forward?
1. Same patch, but tighten the retry check inside bpf_load_program_xattr ?
2. Move this retry logic into load_program and have different handling
   for bpf_create_map_xattr vs bpf_load_program_xattr ?
3. Do 2 and move the retry check for maps from bpf_create_map_xattr
   into bpf_object__create_maps ?

(I'm slightly leaning towards #3)
Alexei Starovoitov Nov. 20, 2018, 11:04 p.m. UTC | #4
On Tue, Nov 20, 2018 at 01:19:05PM -0800, Stanislav Fomichev wrote:
> On 11/20, Alexei Starovoitov wrote:
> > On Mon, Nov 19, 2018 at 04:46:25PM -0800, Stanislav Fomichev wrote:
> > > [Recent commit 23499442c319 ("bpf: libbpf: retry map creation without
> > > the name") fixed this issue for maps, let's do the same for programs.]
> > > 
> > > Since commit 88cda1c9da02 ("bpf: libbpf: Provide basic API support
> > > to specify BPF obj name"), libbpf unconditionally sets bpf_attr->name
> > > for programs. Pre v4.14 kernels don't know about programs names and
> > > return an error about unexpected non-zero data. Retry sys_bpf without
> > > a program name to cover older kernels.
> > > 
> > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > ---
> > >  tools/lib/bpf/bpf.c | 10 ++++++++++
> > >  1 file changed, 10 insertions(+)
> > > 
> > > diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> > > index 961e1b9fc592..cbe9d757c646 100644
> > > --- a/tools/lib/bpf/bpf.c
> > > +++ b/tools/lib/bpf/bpf.c
> > > @@ -212,6 +212,16 @@ int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
> > >  	if (fd >= 0 || !log_buf || !log_buf_sz)
> > >  		return fd;
> > >  
> > > +	if (fd < 0 && errno == E2BIG && load_attr->name) {
> > > +		/* Retry the same syscall, but without the name.
> > > +		 * Pre v4.14 kernels don't support prog names.
> > > +		 */
> > 
> > I'm afraid that will put unnecessary stress on the kernel.
> > This check needs to be tighter.
> > Like E2BIG and anything in the log_buf probably means that
> > E2BIG came from the verifier and nothing to do with prog_name.
> > Asking kernel to repeat is an unnecessary work.
> > 
> > In general we need to think beyond this single prog_name field.
> > There are bunch of other fields in bpf_load_program_xattr() and older kernels
> > won't support them. Are we going to zero them out one by one
> > and retry? I don't think that would be practical.
> I general, we don't want to zero anything out. However,
> for this particular problem the rationale is the following:
> In commit 88cda1c9da02 we started unconditionally setting {prog,map}->name
> from the 'higher' libbpfc layer which breaks users on the older kernels.
> 
> > Also libbpf silently ignoring prog_name is not great for debugging.
> > A warning is needed.
> > But it cannot be done out of lib/bpf/bpf.c, since it's a set of syscall
> > wrappers.
> > Imo such "old kernel -> lets retry" feature should probably be done
> > at lib/bpf/libbpf.c level. inside load_program().
> For maps bpftools calls bpf_create_map_xattr directly, that's why
> for maps I did the retry on the lower level (and why for programs I initially
> thought about doing the same). However, in this case maybe asking
> user to omit 'name' argument might be a better option.
> 
> For program names, I agree, we might think about doing it on the higher
> level (although I'm not sure whether we want to have different API
> expectations, i.e. bpf_create_map_xattr ignoring the name and
> bpf_load_program_xattr not ignoring the name).
> 
> So given that rationale above, what do you think is the best way to
> move forward?
> 1. Same patch, but tighten the retry check inside bpf_load_program_xattr ?
> 2. Move this retry logic into load_program and have different handling
>    for bpf_create_map_xattr vs bpf_load_program_xattr ?
> 3. Do 2 and move the retry check for maps from bpf_create_map_xattr
>    into bpf_object__create_maps ?
> 
> (I'm slightly leaning towards #3)

me too. I think it's cleaner for maps to do it in
bpf_object__create_maps().
Originally bpf.c was envisioned to be a thin layer on top of bpf syscall.
Whereas 'smart bits' would go into libbpf.c
Right now this boundary is unfortunately blurry.
May be as #4 long term option we'll introduce another 'smart' layer
between bpf.c that will assume the latest kernel and libbpf.c that deals
with elf. May be will call this new layer a 'compat' layer?
For now I think doing #3 as you suggested is probably the best short term.
Daniel Borkmann Nov. 20, 2018, 11:18 p.m. UTC | #5
On 11/21/2018 12:04 AM, Alexei Starovoitov wrote:
> On Tue, Nov 20, 2018 at 01:19:05PM -0800, Stanislav Fomichev wrote:
>> On 11/20, Alexei Starovoitov wrote:
>>> On Mon, Nov 19, 2018 at 04:46:25PM -0800, Stanislav Fomichev wrote:
>>>> [Recent commit 23499442c319 ("bpf: libbpf: retry map creation without
>>>> the name") fixed this issue for maps, let's do the same for programs.]
>>>>
>>>> Since commit 88cda1c9da02 ("bpf: libbpf: Provide basic API support
>>>> to specify BPF obj name"), libbpf unconditionally sets bpf_attr->name
>>>> for programs. Pre v4.14 kernels don't know about programs names and
>>>> return an error about unexpected non-zero data. Retry sys_bpf without
>>>> a program name to cover older kernels.
>>>>
>>>> Signed-off-by: Stanislav Fomichev <sdf@google.com>
>>>> ---
>>>>  tools/lib/bpf/bpf.c | 10 ++++++++++
>>>>  1 file changed, 10 insertions(+)
>>>>
>>>> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
>>>> index 961e1b9fc592..cbe9d757c646 100644
>>>> --- a/tools/lib/bpf/bpf.c
>>>> +++ b/tools/lib/bpf/bpf.c
>>>> @@ -212,6 +212,16 @@ int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
>>>>  	if (fd >= 0 || !log_buf || !log_buf_sz)
>>>>  		return fd;
>>>>  
>>>> +	if (fd < 0 && errno == E2BIG && load_attr->name) {
>>>> +		/* Retry the same syscall, but without the name.
>>>> +		 * Pre v4.14 kernels don't support prog names.
>>>> +		 */
>>>
>>> I'm afraid that will put unnecessary stress on the kernel.
>>> This check needs to be tighter.
>>> Like E2BIG and anything in the log_buf probably means that
>>> E2BIG came from the verifier and nothing to do with prog_name.
>>> Asking kernel to repeat is an unnecessary work.
>>>
>>> In general we need to think beyond this single prog_name field.
>>> There are bunch of other fields in bpf_load_program_xattr() and older kernels
>>> won't support them. Are we going to zero them out one by one
>>> and retry? I don't think that would be practical.
>> I general, we don't want to zero anything out. However,
>> for this particular problem the rationale is the following:
>> In commit 88cda1c9da02 we started unconditionally setting {prog,map}->name
>> from the 'higher' libbpfc layer which breaks users on the older kernels.
>>
>>> Also libbpf silently ignoring prog_name is not great for debugging.
>>> A warning is needed.
>>> But it cannot be done out of lib/bpf/bpf.c, since it's a set of syscall
>>> wrappers.
>>> Imo such "old kernel -> lets retry" feature should probably be done
>>> at lib/bpf/libbpf.c level. inside load_program().
>> For maps bpftools calls bpf_create_map_xattr directly, that's why
>> for maps I did the retry on the lower level (and why for programs I initially
>> thought about doing the same). However, in this case maybe asking
>> user to omit 'name' argument might be a better option.
>>
>> For program names, I agree, we might think about doing it on the higher
>> level (although I'm not sure whether we want to have different API
>> expectations, i.e. bpf_create_map_xattr ignoring the name and
>> bpf_load_program_xattr not ignoring the name).
>>
>> So given that rationale above, what do you think is the best way to
>> move forward?
>> 1. Same patch, but tighten the retry check inside bpf_load_program_xattr ?
>> 2. Move this retry logic into load_program and have different handling
>>    for bpf_create_map_xattr vs bpf_load_program_xattr ?
>> 3. Do 2 and move the retry check for maps from bpf_create_map_xattr
>>    into bpf_object__create_maps ?
>>
>> (I'm slightly leaning towards #3)
> 
> me too. I think it's cleaner for maps to do it in
> bpf_object__create_maps().
> Originally bpf.c was envisioned to be a thin layer on top of bpf syscall.
> Whereas 'smart bits' would go into libbpf.c

Can't we create in bpf_object__load() a small helper bpf_object__probe_caps()
which would figure this out _once_ upon start with a few things to probe for
availability in the underlying kernel for maps and programs? E.g. programs
it could try to inject a tiny 'r0 = 0; exit' snippet where we figure out
things like prog name support etc. Given underlying kernel doesn't change, we
would only try this once and it doesn't require fallback every time.

> Right now this boundary is unfortunately blurry.
> May be as #4 long term option we'll introduce another 'smart' layer
> between bpf.c that will assume the latest kernel and libbpf.c that deals
> with elf. May be will call this new layer a 'compat' layer?
> For now I think doing #3 as you suggested is probably the best short term.
>
Alexei Starovoitov Nov. 20, 2018, 11:20 p.m. UTC | #6
On Wed, Nov 21, 2018 at 12:18:57AM +0100, Daniel Borkmann wrote:
> On 11/21/2018 12:04 AM, Alexei Starovoitov wrote:
> > On Tue, Nov 20, 2018 at 01:19:05PM -0800, Stanislav Fomichev wrote:
> >> On 11/20, Alexei Starovoitov wrote:
> >>> On Mon, Nov 19, 2018 at 04:46:25PM -0800, Stanislav Fomichev wrote:
> >>>> [Recent commit 23499442c319 ("bpf: libbpf: retry map creation without
> >>>> the name") fixed this issue for maps, let's do the same for programs.]
> >>>>
> >>>> Since commit 88cda1c9da02 ("bpf: libbpf: Provide basic API support
> >>>> to specify BPF obj name"), libbpf unconditionally sets bpf_attr->name
> >>>> for programs. Pre v4.14 kernels don't know about programs names and
> >>>> return an error about unexpected non-zero data. Retry sys_bpf without
> >>>> a program name to cover older kernels.
> >>>>
> >>>> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> >>>> ---
> >>>>  tools/lib/bpf/bpf.c | 10 ++++++++++
> >>>>  1 file changed, 10 insertions(+)
> >>>>
> >>>> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> >>>> index 961e1b9fc592..cbe9d757c646 100644
> >>>> --- a/tools/lib/bpf/bpf.c
> >>>> +++ b/tools/lib/bpf/bpf.c
> >>>> @@ -212,6 +212,16 @@ int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
> >>>>  	if (fd >= 0 || !log_buf || !log_buf_sz)
> >>>>  		return fd;
> >>>>  
> >>>> +	if (fd < 0 && errno == E2BIG && load_attr->name) {
> >>>> +		/* Retry the same syscall, but without the name.
> >>>> +		 * Pre v4.14 kernels don't support prog names.
> >>>> +		 */
> >>>
> >>> I'm afraid that will put unnecessary stress on the kernel.
> >>> This check needs to be tighter.
> >>> Like E2BIG and anything in the log_buf probably means that
> >>> E2BIG came from the verifier and nothing to do with prog_name.
> >>> Asking kernel to repeat is an unnecessary work.
> >>>
> >>> In general we need to think beyond this single prog_name field.
> >>> There are bunch of other fields in bpf_load_program_xattr() and older kernels
> >>> won't support them. Are we going to zero them out one by one
> >>> and retry? I don't think that would be practical.
> >> I general, we don't want to zero anything out. However,
> >> for this particular problem the rationale is the following:
> >> In commit 88cda1c9da02 we started unconditionally setting {prog,map}->name
> >> from the 'higher' libbpfc layer which breaks users on the older kernels.
> >>
> >>> Also libbpf silently ignoring prog_name is not great for debugging.
> >>> A warning is needed.
> >>> But it cannot be done out of lib/bpf/bpf.c, since it's a set of syscall
> >>> wrappers.
> >>> Imo such "old kernel -> lets retry" feature should probably be done
> >>> at lib/bpf/libbpf.c level. inside load_program().
> >> For maps bpftools calls bpf_create_map_xattr directly, that's why
> >> for maps I did the retry on the lower level (and why for programs I initially
> >> thought about doing the same). However, in this case maybe asking
> >> user to omit 'name' argument might be a better option.
> >>
> >> For program names, I agree, we might think about doing it on the higher
> >> level (although I'm not sure whether we want to have different API
> >> expectations, i.e. bpf_create_map_xattr ignoring the name and
> >> bpf_load_program_xattr not ignoring the name).
> >>
> >> So given that rationale above, what do you think is the best way to
> >> move forward?
> >> 1. Same patch, but tighten the retry check inside bpf_load_program_xattr ?
> >> 2. Move this retry logic into load_program and have different handling
> >>    for bpf_create_map_xattr vs bpf_load_program_xattr ?
> >> 3. Do 2 and move the retry check for maps from bpf_create_map_xattr
> >>    into bpf_object__create_maps ?
> >>
> >> (I'm slightly leaning towards #3)
> > 
> > me too. I think it's cleaner for maps to do it in
> > bpf_object__create_maps().
> > Originally bpf.c was envisioned to be a thin layer on top of bpf syscall.
> > Whereas 'smart bits' would go into libbpf.c
> 
> Can't we create in bpf_object__load() a small helper bpf_object__probe_caps()
> which would figure this out _once_ upon start with a few things to probe for
> availability in the underlying kernel for maps and programs? E.g. programs
> it could try to inject a tiny 'r0 = 0; exit' snippet where we figure out
> things like prog name support etc. Given underlying kernel doesn't change, we
> would only try this once and it doesn't require fallback every time.

+1. great idea!
Stanislav Fomichev Nov. 20, 2018, 11:26 p.m. UTC | #7
On 11/20, Alexei Starovoitov wrote:
> On Wed, Nov 21, 2018 at 12:18:57AM +0100, Daniel Borkmann wrote:
> > On 11/21/2018 12:04 AM, Alexei Starovoitov wrote:
> > > On Tue, Nov 20, 2018 at 01:19:05PM -0800, Stanislav Fomichev wrote:
> > >> On 11/20, Alexei Starovoitov wrote:
> > >>> On Mon, Nov 19, 2018 at 04:46:25PM -0800, Stanislav Fomichev wrote:
> > >>>> [Recent commit 23499442c319 ("bpf: libbpf: retry map creation without
> > >>>> the name") fixed this issue for maps, let's do the same for programs.]
> > >>>>
> > >>>> Since commit 88cda1c9da02 ("bpf: libbpf: Provide basic API support
> > >>>> to specify BPF obj name"), libbpf unconditionally sets bpf_attr->name
> > >>>> for programs. Pre v4.14 kernels don't know about programs names and
> > >>>> return an error about unexpected non-zero data. Retry sys_bpf without
> > >>>> a program name to cover older kernels.
> > >>>>
> > >>>> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > >>>> ---
> > >>>>  tools/lib/bpf/bpf.c | 10 ++++++++++
> > >>>>  1 file changed, 10 insertions(+)
> > >>>>
> > >>>> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> > >>>> index 961e1b9fc592..cbe9d757c646 100644
> > >>>> --- a/tools/lib/bpf/bpf.c
> > >>>> +++ b/tools/lib/bpf/bpf.c
> > >>>> @@ -212,6 +212,16 @@ int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
> > >>>>  	if (fd >= 0 || !log_buf || !log_buf_sz)
> > >>>>  		return fd;
> > >>>>  
> > >>>> +	if (fd < 0 && errno == E2BIG && load_attr->name) {
> > >>>> +		/* Retry the same syscall, but without the name.
> > >>>> +		 * Pre v4.14 kernels don't support prog names.
> > >>>> +		 */
> > >>>
> > >>> I'm afraid that will put unnecessary stress on the kernel.
> > >>> This check needs to be tighter.
> > >>> Like E2BIG and anything in the log_buf probably means that
> > >>> E2BIG came from the verifier and nothing to do with prog_name.
> > >>> Asking kernel to repeat is an unnecessary work.
> > >>>
> > >>> In general we need to think beyond this single prog_name field.
> > >>> There are bunch of other fields in bpf_load_program_xattr() and older kernels
> > >>> won't support them. Are we going to zero them out one by one
> > >>> and retry? I don't think that would be practical.
> > >> I general, we don't want to zero anything out. However,
> > >> for this particular problem the rationale is the following:
> > >> In commit 88cda1c9da02 we started unconditionally setting {prog,map}->name
> > >> from the 'higher' libbpfc layer which breaks users on the older kernels.
> > >>
> > >>> Also libbpf silently ignoring prog_name is not great for debugging.
> > >>> A warning is needed.
> > >>> But it cannot be done out of lib/bpf/bpf.c, since it's a set of syscall
> > >>> wrappers.
> > >>> Imo such "old kernel -> lets retry" feature should probably be done
> > >>> at lib/bpf/libbpf.c level. inside load_program().
> > >> For maps bpftools calls bpf_create_map_xattr directly, that's why
> > >> for maps I did the retry on the lower level (and why for programs I initially
> > >> thought about doing the same). However, in this case maybe asking
> > >> user to omit 'name' argument might be a better option.
> > >>
> > >> For program names, I agree, we might think about doing it on the higher
> > >> level (although I'm not sure whether we want to have different API
> > >> expectations, i.e. bpf_create_map_xattr ignoring the name and
> > >> bpf_load_program_xattr not ignoring the name).
> > >>
> > >> So given that rationale above, what do you think is the best way to
> > >> move forward?
> > >> 1. Same patch, but tighten the retry check inside bpf_load_program_xattr ?
> > >> 2. Move this retry logic into load_program and have different handling
> > >>    for bpf_create_map_xattr vs bpf_load_program_xattr ?
> > >> 3. Do 2 and move the retry check for maps from bpf_create_map_xattr
> > >>    into bpf_object__create_maps ?
> > >>
> > >> (I'm slightly leaning towards #3)
> > > 
> > > me too. I think it's cleaner for maps to do it in
> > > bpf_object__create_maps().
> > > Originally bpf.c was envisioned to be a thin layer on top of bpf syscall.
> > > Whereas 'smart bits' would go into libbpf.c
> > 
> > Can't we create in bpf_object__load() a small helper bpf_object__probe_caps()
> > which would figure this out _once_ upon start with a few things to probe for
> > availability in the underlying kernel for maps and programs? E.g. programs
> > it could try to inject a tiny 'r0 = 0; exit' snippet where we figure out
> > things like prog name support etc. Given underlying kernel doesn't change, we
> > would only try this once and it doesn't require fallback every time.
> 
> +1. great idea!
Sounds good, let me try to do it.

It sounds more like a recent LPC proposal/idea to have some sys_bpf option
to query BPF features. This new bpf_object__probe_caps can probably query
that in the future if we eventually add support for it.
Quentin Monnet Nov. 21, 2018, 2:49 a.m. UTC | #8
2018-11-20 15:26 UTC-0800 ~ Stanislav Fomichev <sdf@fomichev.me>
> On 11/20, Alexei Starovoitov wrote:
>> On Wed, Nov 21, 2018 at 12:18:57AM +0100, Daniel Borkmann wrote:
>>> On 11/21/2018 12:04 AM, Alexei Starovoitov wrote:
>>>> On Tue, Nov 20, 2018 at 01:19:05PM -0800, Stanislav Fomichev wrote:
>>>>> On 11/20, Alexei Starovoitov wrote:
>>>>>> On Mon, Nov 19, 2018 at 04:46:25PM -0800, Stanislav Fomichev wrote:
>>>>>>> [Recent commit 23499442c319 ("bpf: libbpf: retry map creation without
>>>>>>> the name") fixed this issue for maps, let's do the same for programs.]
>>>>>>>
>>>>>>> Since commit 88cda1c9da02 ("bpf: libbpf: Provide basic API support
>>>>>>> to specify BPF obj name"), libbpf unconditionally sets bpf_attr->name
>>>>>>> for programs. Pre v4.14 kernels don't know about programs names and
>>>>>>> return an error about unexpected non-zero data. Retry sys_bpf without
>>>>>>> a program name to cover older kernels.
>>>>>>>
>>>>>>> Signed-off-by: Stanislav Fomichev <sdf@google.com>
>>>>>>> ---
>>>>>>>  tools/lib/bpf/bpf.c | 10 ++++++++++
>>>>>>>  1 file changed, 10 insertions(+)
>>>>>>>
>>>>>>> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
>>>>>>> index 961e1b9fc592..cbe9d757c646 100644
>>>>>>> --- a/tools/lib/bpf/bpf.c
>>>>>>> +++ b/tools/lib/bpf/bpf.c
>>>>>>> @@ -212,6 +212,16 @@ int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
>>>>>>>  	if (fd >= 0 || !log_buf || !log_buf_sz)
>>>>>>>  		return fd;
>>>>>>>  
>>>>>>> +	if (fd < 0 && errno == E2BIG && load_attr->name) {
>>>>>>> +		/* Retry the same syscall, but without the name.
>>>>>>> +		 * Pre v4.14 kernels don't support prog names.
>>>>>>> +		 */
>>>>>>
>>>>>> I'm afraid that will put unnecessary stress on the kernel.
>>>>>> This check needs to be tighter.
>>>>>> Like E2BIG and anything in the log_buf probably means that
>>>>>> E2BIG came from the verifier and nothing to do with prog_name.
>>>>>> Asking kernel to repeat is an unnecessary work.
>>>>>>
>>>>>> In general we need to think beyond this single prog_name field.
>>>>>> There are bunch of other fields in bpf_load_program_xattr() and older kernels
>>>>>> won't support them. Are we going to zero them out one by one
>>>>>> and retry? I don't think that would be practical.
>>>>> I general, we don't want to zero anything out. However,
>>>>> for this particular problem the rationale is the following:
>>>>> In commit 88cda1c9da02 we started unconditionally setting {prog,map}->name
>>>>> from the 'higher' libbpfc layer which breaks users on the older kernels.
>>>>>
>>>>>> Also libbpf silently ignoring prog_name is not great for debugging.
>>>>>> A warning is needed.
>>>>>> But it cannot be done out of lib/bpf/bpf.c, since it's a set of syscall
>>>>>> wrappers.
>>>>>> Imo such "old kernel -> lets retry" feature should probably be done
>>>>>> at lib/bpf/libbpf.c level. inside load_program().
>>>>> For maps bpftools calls bpf_create_map_xattr directly, that's why
>>>>> for maps I did the retry on the lower level (and why for programs I initially
>>>>> thought about doing the same). However, in this case maybe asking
>>>>> user to omit 'name' argument might be a better option.
>>>>>
>>>>> For program names, I agree, we might think about doing it on the higher
>>>>> level (although I'm not sure whether we want to have different API
>>>>> expectations, i.e. bpf_create_map_xattr ignoring the name and
>>>>> bpf_load_program_xattr not ignoring the name).
>>>>>
>>>>> So given that rationale above, what do you think is the best way to
>>>>> move forward?
>>>>> 1. Same patch, but tighten the retry check inside bpf_load_program_xattr ?
>>>>> 2. Move this retry logic into load_program and have different handling
>>>>>    for bpf_create_map_xattr vs bpf_load_program_xattr ?
>>>>> 3. Do 2 and move the retry check for maps from bpf_create_map_xattr
>>>>>    into bpf_object__create_maps ?
>>>>>
>>>>> (I'm slightly leaning towards #3)
>>>>
>>>> me too. I think it's cleaner for maps to do it in
>>>> bpf_object__create_maps().
>>>> Originally bpf.c was envisioned to be a thin layer on top of bpf syscall.
>>>> Whereas 'smart bits' would go into libbpf.c
>>>
>>> Can't we create in bpf_object__load() a small helper bpf_object__probe_caps()
>>> which would figure this out _once_ upon start with a few things to probe for
>>> availability in the underlying kernel for maps and programs? E.g. programs
>>> it could try to inject a tiny 'r0 = 0; exit' snippet where we figure out
>>> things like prog name support etc. Given underlying kernel doesn't change, we
>>> would only try this once and it doesn't require fallback every time.
>>
>> +1. great idea!
> Sounds good, let me try to do it.
> 
> It sounds more like a recent LPC proposal/idea to have some sys_bpf option
> to query BPF features. This new bpf_object__probe_caps can probably query
> that in the future if we eventually add support for it.
> 

Hi,

LPC proposal indeed. I've been working on implementing this kind of
probes in bpftool. I don't probe name support for now (but I can
certainly add it), but I detect supported program types, map types,
header functions, and a couple of other parameters. The idea (initially
from Daniel) was to dump "#define" declarations that could later be
included in a header file and used for a BPF project (or alternatively,
JSON output).

I felt like bpftool was maybe a better place to do it, as the set of
probes may grow large (all types, helpers, etc). It might have
consequences on the running system: for example, if I don't raise the
rlimit in bpftool before starting the probes, a good half of them fail
because of consecutive program creation and reclaim delay for locked
memory usage.

So should we start adding probes to libbpf and should I move mine to the
lib as well, or should the one in the v3 of this series be directed to
something like bpftool instead?

Thanks,
Quentin
Stanislav Fomichev Nov. 21, 2018, 5:28 p.m. UTC | #9
On 11/21, Quentin Monnet wrote:
> 2018-11-20 15:26 UTC-0800 ~ Stanislav Fomichev <sdf@fomichev.me>
> > On 11/20, Alexei Starovoitov wrote:
> >> On Wed, Nov 21, 2018 at 12:18:57AM +0100, Daniel Borkmann wrote:
> >>> On 11/21/2018 12:04 AM, Alexei Starovoitov wrote:
> >>>> On Tue, Nov 20, 2018 at 01:19:05PM -0800, Stanislav Fomichev wrote:
> >>>>> On 11/20, Alexei Starovoitov wrote:
> >>>>>> On Mon, Nov 19, 2018 at 04:46:25PM -0800, Stanislav Fomichev wrote:
> >>>>>>> [Recent commit 23499442c319 ("bpf: libbpf: retry map creation without
> >>>>>>> the name") fixed this issue for maps, let's do the same for programs.]
> >>>>>>>
> >>>>>>> Since commit 88cda1c9da02 ("bpf: libbpf: Provide basic API support
> >>>>>>> to specify BPF obj name"), libbpf unconditionally sets bpf_attr->name
> >>>>>>> for programs. Pre v4.14 kernels don't know about programs names and
> >>>>>>> return an error about unexpected non-zero data. Retry sys_bpf without
> >>>>>>> a program name to cover older kernels.
> >>>>>>>
> >>>>>>> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> >>>>>>> ---
> >>>>>>>  tools/lib/bpf/bpf.c | 10 ++++++++++
> >>>>>>>  1 file changed, 10 insertions(+)
> >>>>>>>
> >>>>>>> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> >>>>>>> index 961e1b9fc592..cbe9d757c646 100644
> >>>>>>> --- a/tools/lib/bpf/bpf.c
> >>>>>>> +++ b/tools/lib/bpf/bpf.c
> >>>>>>> @@ -212,6 +212,16 @@ int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
> >>>>>>>  	if (fd >= 0 || !log_buf || !log_buf_sz)
> >>>>>>>  		return fd;
> >>>>>>>  
> >>>>>>> +	if (fd < 0 && errno == E2BIG && load_attr->name) {
> >>>>>>> +		/* Retry the same syscall, but without the name.
> >>>>>>> +		 * Pre v4.14 kernels don't support prog names.
> >>>>>>> +		 */
> >>>>>>
> >>>>>> I'm afraid that will put unnecessary stress on the kernel.
> >>>>>> This check needs to be tighter.
> >>>>>> Like E2BIG and anything in the log_buf probably means that
> >>>>>> E2BIG came from the verifier and nothing to do with prog_name.
> >>>>>> Asking kernel to repeat is an unnecessary work.
> >>>>>>
> >>>>>> In general we need to think beyond this single prog_name field.
> >>>>>> There are bunch of other fields in bpf_load_program_xattr() and older kernels
> >>>>>> won't support them. Are we going to zero them out one by one
> >>>>>> and retry? I don't think that would be practical.
> >>>>> I general, we don't want to zero anything out. However,
> >>>>> for this particular problem the rationale is the following:
> >>>>> In commit 88cda1c9da02 we started unconditionally setting {prog,map}->name
> >>>>> from the 'higher' libbpfc layer which breaks users on the older kernels.
> >>>>>
> >>>>>> Also libbpf silently ignoring prog_name is not great for debugging.
> >>>>>> A warning is needed.
> >>>>>> But it cannot be done out of lib/bpf/bpf.c, since it's a set of syscall
> >>>>>> wrappers.
> >>>>>> Imo such "old kernel -> lets retry" feature should probably be done
> >>>>>> at lib/bpf/libbpf.c level. inside load_program().
> >>>>> For maps bpftools calls bpf_create_map_xattr directly, that's why
> >>>>> for maps I did the retry on the lower level (and why for programs I initially
> >>>>> thought about doing the same). However, in this case maybe asking
> >>>>> user to omit 'name' argument might be a better option.
> >>>>>
> >>>>> For program names, I agree, we might think about doing it on the higher
> >>>>> level (although I'm not sure whether we want to have different API
> >>>>> expectations, i.e. bpf_create_map_xattr ignoring the name and
> >>>>> bpf_load_program_xattr not ignoring the name).
> >>>>>
> >>>>> So given that rationale above, what do you think is the best way to
> >>>>> move forward?
> >>>>> 1. Same patch, but tighten the retry check inside bpf_load_program_xattr ?
> >>>>> 2. Move this retry logic into load_program and have different handling
> >>>>>    for bpf_create_map_xattr vs bpf_load_program_xattr ?
> >>>>> 3. Do 2 and move the retry check for maps from bpf_create_map_xattr
> >>>>>    into bpf_object__create_maps ?
> >>>>>
> >>>>> (I'm slightly leaning towards #3)
> >>>>
> >>>> me too. I think it's cleaner for maps to do it in
> >>>> bpf_object__create_maps().
> >>>> Originally bpf.c was envisioned to be a thin layer on top of bpf syscall.
> >>>> Whereas 'smart bits' would go into libbpf.c
> >>>
> >>> Can't we create in bpf_object__load() a small helper bpf_object__probe_caps()
> >>> which would figure this out _once_ upon start with a few things to probe for
> >>> availability in the underlying kernel for maps and programs? E.g. programs
> >>> it could try to inject a tiny 'r0 = 0; exit' snippet where we figure out
> >>> things like prog name support etc. Given underlying kernel doesn't change, we
> >>> would only try this once and it doesn't require fallback every time.
> >>
> >> +1. great idea!
> > Sounds good, let me try to do it.
> > 
> > It sounds more like a recent LPC proposal/idea to have some sys_bpf option
> > to query BPF features. This new bpf_object__probe_caps can probably query
> > that in the future if we eventually add support for it.
> > 
> 
> Hi,
> 
> LPC proposal indeed. I've been working on implementing this kind of
> probes in bpftool. I don't probe name support for now (but I can
> certainly add it), but I detect supported program types, map types,
> header functions, and a couple of other parameters. The idea (initially
> from Daniel) was to dump "#define" declarations that could later be
> included in a header file and used for a BPF project (or alternatively,
> JSON output).
Oh, nice, I didn't know someone was already working on it!

> I felt like bpftool was maybe a better place to do it, as the set of
> probes may grow large (all types, helpers, etc). It might have
> consequences on the running system: for example, if I don't raise the
> rlimit in bpftool before starting the probes, a good half of them fail
> because of consecutive program creation and reclaim delay for locked
> memory usage.
Should we aim for something like build system feature checks? Where we
preserve these probe results between bpftool/libbpf invocations so we
don't re-run them again?

> So should we start adding probes to libbpf and should I move mine to the
> lib as well, or should the one in the v3 of this series be directed to
> something like bpftool instead?
We need them (well, at least the name checks) for libbpf because that's
what we link against and what we use to load the programs, bpftool is
less of an issue right now. But my patch was mostly a hackish solution
until we get the real feature checks :-)
Quentin Monnet Nov. 23, 2018, 10:51 a.m. UTC | #10
2018-11-21 09:28 UTC-0800 ~ Stanislav Fomichev <sdf@fomichev.me>
> On 11/21, Quentin Monnet wrote:
>> 2018-11-20 15:26 UTC-0800 ~ Stanislav Fomichev <sdf@fomichev.me>
>>> On 11/20, Alexei Starovoitov wrote:
>>>> On Wed, Nov 21, 2018 at 12:18:57AM +0100, Daniel Borkmann wrote:
>>>>> On 11/21/2018 12:04 AM, Alexei Starovoitov wrote:
>>>>>> On Tue, Nov 20, 2018 at 01:19:05PM -0800, Stanislav Fomichev wrote:
>>>>>>> On 11/20, Alexei Starovoitov wrote:
>>>>>>>> On Mon, Nov 19, 2018 at 04:46:25PM -0800, Stanislav Fomichev wrote:
>>>>>>>>> [Recent commit 23499442c319 ("bpf: libbpf: retry map creation without
>>>>>>>>> the name") fixed this issue for maps, let's do the same for programs.]
>>>>>>>>>
>>>>>>>>> Since commit 88cda1c9da02 ("bpf: libbpf: Provide basic API support
>>>>>>>>> to specify BPF obj name"), libbpf unconditionally sets bpf_attr->name
>>>>>>>>> for programs. Pre v4.14 kernels don't know about programs names and
>>>>>>>>> return an error about unexpected non-zero data. Retry sys_bpf without
>>>>>>>>> a program name to cover older kernels.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Stanislav Fomichev <sdf@google.com>
>>>>>>>>> ---
>>>>>>>>>   tools/lib/bpf/bpf.c | 10 ++++++++++
>>>>>>>>>   1 file changed, 10 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
>>>>>>>>> index 961e1b9fc592..cbe9d757c646 100644
>>>>>>>>> --- a/tools/lib/bpf/bpf.c
>>>>>>>>> +++ b/tools/lib/bpf/bpf.c
>>>>>>>>> @@ -212,6 +212,16 @@ int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
>>>>>>>>>   	if (fd >= 0 || !log_buf || !log_buf_sz)
>>>>>>>>>   		return fd;
>>>>>>>>>   
>>>>>>>>> +	if (fd < 0 && errno == E2BIG && load_attr->name) {
>>>>>>>>> +		/* Retry the same syscall, but without the name.
>>>>>>>>> +		 * Pre v4.14 kernels don't support prog names.
>>>>>>>>> +		 */
>>>>>>>>
>>>>>>>> I'm afraid that will put unnecessary stress on the kernel.
>>>>>>>> This check needs to be tighter.
>>>>>>>> Like E2BIG and anything in the log_buf probably means that
>>>>>>>> E2BIG came from the verifier and nothing to do with prog_name.
>>>>>>>> Asking kernel to repeat is an unnecessary work.
>>>>>>>>
>>>>>>>> In general we need to think beyond this single prog_name field.
>>>>>>>> There are bunch of other fields in bpf_load_program_xattr() and older kernels
>>>>>>>> won't support them. Are we going to zero them out one by one
>>>>>>>> and retry? I don't think that would be practical.
>>>>>>> I general, we don't want to zero anything out. However,
>>>>>>> for this particular problem the rationale is the following:
>>>>>>> In commit 88cda1c9da02 we started unconditionally setting {prog,map}->name
>>>>>>> from the 'higher' libbpfc layer which breaks users on the older kernels.
>>>>>>>
>>>>>>>> Also libbpf silently ignoring prog_name is not great for debugging.
>>>>>>>> A warning is needed.
>>>>>>>> But it cannot be done out of lib/bpf/bpf.c, since it's a set of syscall
>>>>>>>> wrappers.
>>>>>>>> Imo such "old kernel -> lets retry" feature should probably be done
>>>>>>>> at lib/bpf/libbpf.c level. inside load_program().
>>>>>>> For maps bpftools calls bpf_create_map_xattr directly, that's why
>>>>>>> for maps I did the retry on the lower level (and why for programs I initially
>>>>>>> thought about doing the same). However, in this case maybe asking
>>>>>>> user to omit 'name' argument might be a better option.
>>>>>>>
>>>>>>> For program names, I agree, we might think about doing it on the higher
>>>>>>> level (although I'm not sure whether we want to have different API
>>>>>>> expectations, i.e. bpf_create_map_xattr ignoring the name and
>>>>>>> bpf_load_program_xattr not ignoring the name).
>>>>>>>
>>>>>>> So given that rationale above, what do you think is the best way to
>>>>>>> move forward?
>>>>>>> 1. Same patch, but tighten the retry check inside bpf_load_program_xattr ?
>>>>>>> 2. Move this retry logic into load_program and have different handling
>>>>>>>     for bpf_create_map_xattr vs bpf_load_program_xattr ?
>>>>>>> 3. Do 2 and move the retry check for maps from bpf_create_map_xattr
>>>>>>>     into bpf_object__create_maps ?
>>>>>>>
>>>>>>> (I'm slightly leaning towards #3)
>>>>>>
>>>>>> me too. I think it's cleaner for maps to do it in
>>>>>> bpf_object__create_maps().
>>>>>> Originally bpf.c was envisioned to be a thin layer on top of bpf syscall.
>>>>>> Whereas 'smart bits' would go into libbpf.c
>>>>>
>>>>> Can't we create in bpf_object__load() a small helper bpf_object__probe_caps()
>>>>> which would figure this out _once_ upon start with a few things to probe for
>>>>> availability in the underlying kernel for maps and programs? E.g. programs
>>>>> it could try to inject a tiny 'r0 = 0; exit' snippet where we figure out
>>>>> things like prog name support etc. Given underlying kernel doesn't change, we
>>>>> would only try this once and it doesn't require fallback every time.
>>>>
>>>> +1. great idea!
>>> Sounds good, let me try to do it.
>>>
>>> It sounds more like a recent LPC proposal/idea to have some sys_bpf option
>>> to query BPF features. This new bpf_object__probe_caps can probably query
>>> that in the future if we eventually add support for it.
>>>
>>
>> Hi,
>>
>> LPC proposal indeed. I've been working on implementing this kind of
>> probes in bpftool. I don't probe name support for now (but I can
>> certainly add it), but I detect supported program types, map types,
>> header functions, and a couple of other parameters. The idea (initially
>> from Daniel) was to dump "#define" declarations that could later be
>> included in a header file and used for a BPF project (or alternatively,
>> JSON output).
> Oh, nice, I didn't know someone was already working on it!
> 
>> I felt like bpftool was maybe a better place to do it, as the set of
>> probes may grow large (all types, helpers, etc). It might have
>> consequences on the running system: for example, if I don't raise the
>> rlimit in bpftool before starting the probes, a good half of them fail
>> because of consecutive program creation and reclaim delay for locked
>> memory usage.
> Should we aim for something like build system feature checks? Where we
> preserve these probe results between bpftool/libbpf invocations so we
> don't re-run them again?
> 
>> So should we start adding probes to libbpf and should I move mine to the
>> lib as well, or should the one in the v3 of this series be directed to
>> something like bpftool instead?
> We need them (well, at least the name checks) for libbpf because that's
> what we link against and what we use to load the programs, bpftool is
> less of an issue right now. But my patch was mostly a hackish solution
> until we get the real feature checks :-)

Hi Stanislav,
Apologies for the delayed answer, I have been travelling.

I don't know if the probes should be "shared" with libbpf. What I had in 
mind was rather to run them with bpftool and to produce something like a 
header containing the results of the probes. Then this header could be 
included in whatever software wants to manage BPF objects, and that 
software can then decide to call (or not to call) libbpf functions in 
one way or another, depending on the features supported by the system. 
It means you have to recompile your program for the target system, or at 
least to load probe results as a configuration file somehow.

I see it as the best approach for supported program or map types for 
example, because I do not believe we should prevent a user to attempt to 
load a given program type with libbpf if they want to, even if probes 
told us this program type is not supported. But on the other hand, 
fixing up the calls as you did for program names is probably easier if 
there are probes in libbpf, so it can just be adjusted at runtime... I 
just feel concerned that they might grow too big in time, your patch 
means that we now have two more programs to load each time we call 
bpf_object__load().

I'll try to post a bpftool patch with my probes next week (and CC you), 
so that we can discuss further about this.

Quentin
Vlad Dumitrescu Nov. 26, 2018, 7:08 p.m. UTC | #11
On Fri, Nov 23, 2018 at 2:51 AM Quentin Monnet
<quentin.monnet@netronome.com> wrote:
>
> 2018-11-21 09:28 UTC-0800 ~ Stanislav Fomichev <sdf@fomichev.me>
> > On 11/21, Quentin Monnet wrote:
> >> 2018-11-20 15:26 UTC-0800 ~ Stanislav Fomichev <sdf@fomichev.me>
> >>> On 11/20, Alexei Starovoitov wrote:
> >>>> On Wed, Nov 21, 2018 at 12:18:57AM +0100, Daniel Borkmann wrote:
> >>>>> On 11/21/2018 12:04 AM, Alexei Starovoitov wrote:
> >>>>>> On Tue, Nov 20, 2018 at 01:19:05PM -0800, Stanislav Fomichev wrote:
> >>>>>>> On 11/20, Alexei Starovoitov wrote:
> >>>>>>>> On Mon, Nov 19, 2018 at 04:46:25PM -0800, Stanislav Fomichev wrote:
> >>>>>>>>> [Recent commit 23499442c319 ("bpf: libbpf: retry map creation without
> >>>>>>>>> the name") fixed this issue for maps, let's do the same for programs.]
> >>>>>>>>>
> >>>>>>>>> Since commit 88cda1c9da02 ("bpf: libbpf: Provide basic API support
> >>>>>>>>> to specify BPF obj name"), libbpf unconditionally sets bpf_attr->name
> >>>>>>>>> for programs. Pre v4.14 kernels don't know about programs names and
> >>>>>>>>> return an error about unexpected non-zero data. Retry sys_bpf without
> >>>>>>>>> a program name to cover older kernels.
> >>>>>>>>>
> >>>>>>>>> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> >>>>>>>>> ---
> >>>>>>>>>   tools/lib/bpf/bpf.c | 10 ++++++++++
> >>>>>>>>>   1 file changed, 10 insertions(+)
> >>>>>>>>>
> >>>>>>>>> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> >>>>>>>>> index 961e1b9fc592..cbe9d757c646 100644
> >>>>>>>>> --- a/tools/lib/bpf/bpf.c
> >>>>>>>>> +++ b/tools/lib/bpf/bpf.c
> >>>>>>>>> @@ -212,6 +212,16 @@ int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
> >>>>>>>>>       if (fd >= 0 || !log_buf || !log_buf_sz)
> >>>>>>>>>               return fd;
> >>>>>>>>>
> >>>>>>>>> +     if (fd < 0 && errno == E2BIG && load_attr->name) {
> >>>>>>>>> +             /* Retry the same syscall, but without the name.
> >>>>>>>>> +              * Pre v4.14 kernels don't support prog names.
> >>>>>>>>> +              */
> >>>>>>>>
> >>>>>>>> I'm afraid that will put unnecessary stress on the kernel.
> >>>>>>>> This check needs to be tighter.
> >>>>>>>> Like E2BIG and anything in the log_buf probably means that
> >>>>>>>> E2BIG came from the verifier and nothing to do with prog_name.
> >>>>>>>> Asking kernel to repeat is an unnecessary work.
> >>>>>>>>
> >>>>>>>> In general we need to think beyond this single prog_name field.
> >>>>>>>> There are bunch of other fields in bpf_load_program_xattr() and older kernels
> >>>>>>>> won't support them. Are we going to zero them out one by one
> >>>>>>>> and retry? I don't think that would be practical.
> >>>>>>> I general, we don't want to zero anything out. However,
> >>>>>>> for this particular problem the rationale is the following:
> >>>>>>> In commit 88cda1c9da02 we started unconditionally setting {prog,map}->name
> >>>>>>> from the 'higher' libbpfc layer which breaks users on the older kernels.
> >>>>>>>
> >>>>>>>> Also libbpf silently ignoring prog_name is not great for debugging.
> >>>>>>>> A warning is needed.
> >>>>>>>> But it cannot be done out of lib/bpf/bpf.c, since it's a set of syscall
> >>>>>>>> wrappers.
> >>>>>>>> Imo such "old kernel -> lets retry" feature should probably be done
> >>>>>>>> at lib/bpf/libbpf.c level. inside load_program().
> >>>>>>> For maps bpftools calls bpf_create_map_xattr directly, that's why
> >>>>>>> for maps I did the retry on the lower level (and why for programs I initially
> >>>>>>> thought about doing the same). However, in this case maybe asking
> >>>>>>> user to omit 'name' argument might be a better option.
> >>>>>>>
> >>>>>>> For program names, I agree, we might think about doing it on the higher
> >>>>>>> level (although I'm not sure whether we want to have different API
> >>>>>>> expectations, i.e. bpf_create_map_xattr ignoring the name and
> >>>>>>> bpf_load_program_xattr not ignoring the name).
> >>>>>>>
> >>>>>>> So given that rationale above, what do you think is the best way to
> >>>>>>> move forward?
> >>>>>>> 1. Same patch, but tighten the retry check inside bpf_load_program_xattr ?
> >>>>>>> 2. Move this retry logic into load_program and have different handling
> >>>>>>>     for bpf_create_map_xattr vs bpf_load_program_xattr ?
> >>>>>>> 3. Do 2 and move the retry check for maps from bpf_create_map_xattr
> >>>>>>>     into bpf_object__create_maps ?
> >>>>>>>
> >>>>>>> (I'm slightly leaning towards #3)
> >>>>>>
> >>>>>> me too. I think it's cleaner for maps to do it in
> >>>>>> bpf_object__create_maps().
> >>>>>> Originally bpf.c was envisioned to be a thin layer on top of bpf syscall.
> >>>>>> Whereas 'smart bits' would go into libbpf.c
> >>>>>
> >>>>> Can't we create in bpf_object__load() a small helper bpf_object__probe_caps()
> >>>>> which would figure this out _once_ upon start with a few things to probe for
> >>>>> availability in the underlying kernel for maps and programs? E.g. programs
> >>>>> it could try to inject a tiny 'r0 = 0; exit' snippet where we figure out
> >>>>> things like prog name support etc. Given underlying kernel doesn't change, we
> >>>>> would only try this once and it doesn't require fallback every time.
> >>>>
> >>>> +1. great idea!
> >>> Sounds good, let me try to do it.
> >>>
> >>> It sounds more like a recent LPC proposal/idea to have some sys_bpf option
> >>> to query BPF features. This new bpf_object__probe_caps can probably query
> >>> that in the future if we eventually add support for it.
> >>>
> >>
> >> Hi,
> >>
> >> LPC proposal indeed. I've been working on implementing this kind of
> >> probes in bpftool. I don't probe name support for now (but I can
> >> certainly add it), but I detect supported program types, map types,
> >> header functions, and a couple of other parameters. The idea (initially
> >> from Daniel) was to dump "#define" declarations that could later be
> >> included in a header file and used for a BPF project (or alternatively,
> >> JSON output).
> > Oh, nice, I didn't know someone was already working on it!
> >
> >> I felt like bpftool was maybe a better place to do it, as the set of
> >> probes may grow large (all types, helpers, etc). It might have
> >> consequences on the running system: for example, if I don't raise the
> >> rlimit in bpftool before starting the probes, a good half of them fail
> >> because of consecutive program creation and reclaim delay for locked
> >> memory usage.
> > Should we aim for something like build system feature checks? Where we
> > preserve these probe results between bpftool/libbpf invocations so we
> > don't re-run them again?
> >
> >> So should we start adding probes to libbpf and should I move mine to the
> >> lib as well, or should the one in the v3 of this series be directed to
> >> something like bpftool instead?
> > We need them (well, at least the name checks) for libbpf because that's
> > what we link against and what we use to load the programs, bpftool is
> > less of an issue right now. But my patch was mostly a hackish solution
> > until we get the real feature checks :-)
>
> Hi Stanislav,
> Apologies for the delayed answer, I have been travelling.
>
> I don't know if the probes should be "shared" with libbpf. What I had in
> mind was rather to run them with bpftool and to produce something like a
> header containing the results of the probes. Then this header could be
> included in whatever software wants to manage BPF objects, and that
> software can then decide to call (or not to call) libbpf functions in
> one way or another, depending on the features supported by the system.
> It means you have to recompile your program for the target system, or at
> least to load probe results as a configuration file somehow.

Will this work for what Stanislav is trying to solve? The 'whatever
software wants to manage BPF objects' (WS) has no option to choose
whether the map and program names are passed to kernel. The libbpf API
used by WS, in this case, is bpf_object__load(), which internally
decides when names are passed.

>
> I see it as the best approach for supported program or map types for
> example, because I do not believe we should prevent a user to attempt to
> load a given program type with libbpf if they want to, even if probes
> told us this program type is not supported. But on the other hand,
> fixing up the calls as you did for program names is probably easier if
> there are probes in libbpf, so it can just be adjusted at runtime... I
> just feel concerned that they might grow too big in time, your patch
> means that we now have two more programs to load each time we call
> bpf_object__load().
>
> I'll try to post a bpftool patch with my probes next week (and CC you),
> so that we can discuss further about this.
>
> Quentin
Quentin Monnet Nov. 26, 2018, 7:45 p.m. UTC | #12
2018-11-26 11:08 UTC-0800 ~ Vlad Dumitrescu <vlad@dumitrescu.ro>
> On Fri, Nov 23, 2018 at 2:51 AM Quentin Monnet
> <quentin.monnet@netronome.com> wrote:
>>
>> 2018-11-21 09:28 UTC-0800 ~ Stanislav Fomichev <sdf@fomichev.me>
>>> On 11/21, Quentin Monnet wrote:
>>>> 2018-11-20 15:26 UTC-0800 ~ Stanislav Fomichev <sdf@fomichev.me>
>>>>> On 11/20, Alexei Starovoitov wrote:
>>>>>> On Wed, Nov 21, 2018 at 12:18:57AM +0100, Daniel Borkmann wrote:
>>>>>>> On 11/21/2018 12:04 AM, Alexei Starovoitov wrote:
>>>>>>>> On Tue, Nov 20, 2018 at 01:19:05PM -0800, Stanislav Fomichev wrote:
>>>>>>>>> On 11/20, Alexei Starovoitov wrote:
>>>>>>>>>> On Mon, Nov 19, 2018 at 04:46:25PM -0800, Stanislav Fomichev wrote:
>>>>>>>>>>> [Recent commit 23499442c319 ("bpf: libbpf: retry map creation without
>>>>>>>>>>> the name") fixed this issue for maps, let's do the same for programs.]
>>>>>>>>>>>
>>>>>>>>>>> Since commit 88cda1c9da02 ("bpf: libbpf: Provide basic API support
>>>>>>>>>>> to specify BPF obj name"), libbpf unconditionally sets bpf_attr->name
>>>>>>>>>>> for programs. Pre v4.14 kernels don't know about programs names and
>>>>>>>>>>> return an error about unexpected non-zero data. Retry sys_bpf without
>>>>>>>>>>> a program name to cover older kernels.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Stanislav Fomichev <sdf@google.com>
>>>>>>>>>>> ---
>>>>>>>>>>>   tools/lib/bpf/bpf.c | 10 ++++++++++
>>>>>>>>>>>   1 file changed, 10 insertions(+)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
>>>>>>>>>>> index 961e1b9fc592..cbe9d757c646 100644
>>>>>>>>>>> --- a/tools/lib/bpf/bpf.c
>>>>>>>>>>> +++ b/tools/lib/bpf/bpf.c
>>>>>>>>>>> @@ -212,6 +212,16 @@ int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
>>>>>>>>>>>       if (fd >= 0 || !log_buf || !log_buf_sz)
>>>>>>>>>>>               return fd;
>>>>>>>>>>>
>>>>>>>>>>> +     if (fd < 0 && errno == E2BIG && load_attr->name) {
>>>>>>>>>>> +             /* Retry the same syscall, but without the name.
>>>>>>>>>>> +              * Pre v4.14 kernels don't support prog names.
>>>>>>>>>>> +              */
>>>>>>>>>>
>>>>>>>>>> I'm afraid that will put unnecessary stress on the kernel.
>>>>>>>>>> This check needs to be tighter.
>>>>>>>>>> Like E2BIG and anything in the log_buf probably means that
>>>>>>>>>> E2BIG came from the verifier and nothing to do with prog_name.
>>>>>>>>>> Asking kernel to repeat is an unnecessary work.
>>>>>>>>>>
>>>>>>>>>> In general we need to think beyond this single prog_name field.
>>>>>>>>>> There are bunch of other fields in bpf_load_program_xattr() and older kernels
>>>>>>>>>> won't support them. Are we going to zero them out one by one
>>>>>>>>>> and retry? I don't think that would be practical.
>>>>>>>>> I general, we don't want to zero anything out. However,
>>>>>>>>> for this particular problem the rationale is the following:
>>>>>>>>> In commit 88cda1c9da02 we started unconditionally setting {prog,map}->name
>>>>>>>>> from the 'higher' libbpfc layer which breaks users on the older kernels.
>>>>>>>>>
>>>>>>>>>> Also libbpf silently ignoring prog_name is not great for debugging.
>>>>>>>>>> A warning is needed.
>>>>>>>>>> But it cannot be done out of lib/bpf/bpf.c, since it's a set of syscall
>>>>>>>>>> wrappers.
>>>>>>>>>> Imo such "old kernel -> lets retry" feature should probably be done
>>>>>>>>>> at lib/bpf/libbpf.c level. inside load_program().
>>>>>>>>> For maps bpftools calls bpf_create_map_xattr directly, that's why
>>>>>>>>> for maps I did the retry on the lower level (and why for programs I initially
>>>>>>>>> thought about doing the same). However, in this case maybe asking
>>>>>>>>> user to omit 'name' argument might be a better option.
>>>>>>>>>
>>>>>>>>> For program names, I agree, we might think about doing it on the higher
>>>>>>>>> level (although I'm not sure whether we want to have different API
>>>>>>>>> expectations, i.e. bpf_create_map_xattr ignoring the name and
>>>>>>>>> bpf_load_program_xattr not ignoring the name).
>>>>>>>>>
>>>>>>>>> So given that rationale above, what do you think is the best way to
>>>>>>>>> move forward?
>>>>>>>>> 1. Same patch, but tighten the retry check inside bpf_load_program_xattr ?
>>>>>>>>> 2. Move this retry logic into load_program and have different handling
>>>>>>>>>     for bpf_create_map_xattr vs bpf_load_program_xattr ?
>>>>>>>>> 3. Do 2 and move the retry check for maps from bpf_create_map_xattr
>>>>>>>>>     into bpf_object__create_maps ?
>>>>>>>>>
>>>>>>>>> (I'm slightly leaning towards #3)
>>>>>>>>
>>>>>>>> me too. I think it's cleaner for maps to do it in
>>>>>>>> bpf_object__create_maps().
>>>>>>>> Originally bpf.c was envisioned to be a thin layer on top of bpf syscall.
>>>>>>>> Whereas 'smart bits' would go into libbpf.c
>>>>>>>
>>>>>>> Can't we create in bpf_object__load() a small helper bpf_object__probe_caps()
>>>>>>> which would figure this out _once_ upon start with a few things to probe for
>>>>>>> availability in the underlying kernel for maps and programs? E.g. programs
>>>>>>> it could try to inject a tiny 'r0 = 0; exit' snippet where we figure out
>>>>>>> things like prog name support etc. Given underlying kernel doesn't change, we
>>>>>>> would only try this once and it doesn't require fallback every time.
>>>>>>
>>>>>> +1. great idea!
>>>>> Sounds good, let me try to do it.
>>>>>
>>>>> It sounds more like a recent LPC proposal/idea to have some sys_bpf option
>>>>> to query BPF features. This new bpf_object__probe_caps can probably query
>>>>> that in the future if we eventually add support for it.
>>>>>
>>>>
>>>> Hi,
>>>>
>>>> LPC proposal indeed. I've been working on implementing this kind of
>>>> probes in bpftool. I don't probe name support for now (but I can
>>>> certainly add it), but I detect supported program types, map types,
>>>> header functions, and a couple of other parameters. The idea (initially
>>>> from Daniel) was to dump "#define" declarations that could later be
>>>> included in a header file and used for a BPF project (or alternatively,
>>>> JSON output).
>>> Oh, nice, I didn't know someone was already working on it!
>>>
>>>> I felt like bpftool was maybe a better place to do it, as the set of
>>>> probes may grow large (all types, helpers, etc). It might have
>>>> consequences on the running system: for example, if I don't raise the
>>>> rlimit in bpftool before starting the probes, a good half of them fail
>>>> because of consecutive program creation and reclaim delay for locked
>>>> memory usage.
>>> Should we aim for something like build system feature checks? Where we
>>> preserve these probe results between bpftool/libbpf invocations so we
>>> don't re-run them again?
>>>
>>>> So should we start adding probes to libbpf and should I move mine to the
>>>> lib as well, or should the one in the v3 of this series be directed to
>>>> something like bpftool instead?
>>> We need them (well, at least the name checks) for libbpf because that's
>>> what we link against and what we use to load the programs, bpftool is
>>> less of an issue right now. But my patch was mostly a hackish solution
>>> until we get the real feature checks :-)
>>
>> Hi Stanislav,
>> Apologies for the delayed answer, I have been travelling.
>>
>> I don't know if the probes should be "shared" with libbpf. What I had in
>> mind was rather to run them with bpftool and to produce something like a
>> header containing the results of the probes. Then this header could be
>> included in whatever software wants to manage BPF objects, and that
>> software can then decide to call (or not to call) libbpf functions in
>> one way or another, depending on the features supported by the system.
>> It means you have to recompile your program for the target system, or at
>> least to load probe results as a configuration file somehow.
> 
> Will this work for what Stanislav is trying to solve? The 'whatever
> software wants to manage BPF objects' (WS) has no option to choose
> whether the map and program names are passed to kernel. The libbpf API
> used by WS, in this case, is bpf_object__load(), which internally
> decides when names are passed.

Hm that's correct. I had in mind something like
bpf_load_program_xattr(), but it's true that with bpf_object__load() you
do not get to choose if you pass the name or not. Well I suppose we'll
keep some probes in libbpf, bpftool might not be a solution in that case
indeed.

Thanks,
Quentin
diff mbox series

Patch

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 961e1b9fc592..cbe9d757c646 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -212,6 +212,16 @@  int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
 	if (fd >= 0 || !log_buf || !log_buf_sz)
 		return fd;
 
+	if (fd < 0 && errno == E2BIG && load_attr->name) {
+		/* Retry the same syscall, but without the name.
+		 * Pre v4.14 kernels don't support prog names.
+		 */
+		memset(attr.prog_name, 0, sizeof(attr.prog_name));
+		fd = sys_bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
+		if (fd >= 0 || !log_buf || !log_buf_sz)
+			return fd;
+	}
+
 	/* Try again with log */
 	attr.log_buf = ptr_to_u64(log_buf);
 	attr.log_size = log_buf_sz;