[bpf-next,v1,15/19] tools/libbpf: add bpf_iter support

Message ID	20200427201252.2996037-1-yhs@fb.com
State	Changes Requested
Delegated to:	BPF Maintainers
Headers	show Return-Path: <netdev-owner@vger.kernel.org> Smtp-Origin-Hostprefix: devbig From: Yonghong Song <yhs@fb.com> Smtp-Origin-Hostname: devbig003.ftw2.facebook.com To: Andrii Nakryiko <andriin@fb.com>, <bpf@vger.kernel.org>, Martin KaFai Lau <kafai@fb.com>, <netdev@vger.kernel.org> CC: Alexei Starovoitov <ast@fb.com>, Daniel Borkmann <daniel@iogearbox.net>, <kernel-team@fb.com> Smtp-Origin-Cluster: ftw2c04 Subject: [PATCH bpf-next v1 15/19] tools/libbpf: add bpf_iter support Date: Mon, 27 Apr 2020 13:12:52 -0700 Message-ID: <20200427201252.2996037-1-yhs@fb.com> In-Reply-To: <20200427201235.2994549-1-yhs@fb.com> References: <20200427201235.2994549-1-yhs@fb.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain Sender: netdev-owner@vger.kernel.org Precedence: bulk
Series	bpf: implement bpf iterator for kernel data \| expand [bpf-next,v1,00/19] bpf: implement bpf iterator for kernel data [bpf-next,v1,01/19] net: refactor net assignment for seq_net_private structure [bpf-next,v1,02/19] bpf: implement an interface to register bpf_iter targets [bpf-next,v1,03/19] bpf: add bpf_map iterator [bpf-next,v1,04/19] bpf: allow loading of a bpf_iter program [bpf-next,v1,05/19] bpf: support bpf tracing/iter programs for BPF_LINK_CREATE [bpf-next,v1,06/19] bpf: support bpf tracing/iter programs for BPF_LINK_UPDATE [bpf-next,v1,07/19] bpf: create anonymous bpf iterator [bpf-next,v1,08/19] bpf: create file bpf iterator [bpf-next,v1,09/19] bpf: add PTR_TO_BTF_ID_OR_NULL support [bpf-next,v1,10/19] bpf: add netlink and ipv6_route targets [bpf-next,v1,11/19] bpf: add task and task/file targets [bpf-next,v1,12/19] bpf: add bpf_seq_printf and bpf_seq_write helpers [bpf-next,v1,13/19] bpf: handle spilled PTR_TO_BTF_ID properly when checking stack_boundary [bpf-next,v1,14/19] bpf: support variable length array in tracing programs [bpf-next,v1,15/19] tools/libbpf: add bpf_iter support [bpf-next,v1,16/19] tools/bpftool: add bpf_iter support for bptool [bpf-next,v1,17/19] tools/bpf: selftests: add iterator programs for ipv6_route and netlink [bpf-next,v1,18/19] tools/bpf: selftests: add iter progs for bpf_map/task/task_file [bpf-next,v1,19/19] tools/bpf: selftests: add bpf_iter selftests

Message ID

20200427201252.2996037-1-yhs@fb.com

State

Changes Requested

Delegated to:

BPF Maintainers

Headers

Smtp-Origin-Hostprefix: devbig
From: Yonghong Song <yhs@fb.com>
Smtp-Origin-Hostname: devbig003.ftw2.facebook.com
To: Andrii Nakryiko <andriin@fb.com>, <bpf@vger.kernel.org>,
        Martin KaFai Lau <kafai@fb.com>, <netdev@vger.kernel.org>
CC: Alexei Starovoitov <ast@fb.com>,
        Daniel Borkmann <daniel@iogearbox.net>, <kernel-team@fb.com>
Smtp-Origin-Cluster: ftw2c04
Subject: [PATCH bpf-next v1 15/19] tools/libbpf: add bpf_iter support
Date: Mon, 27 Apr 2020 13:12:52 -0700
Message-ID: <20200427201252.2996037-1-yhs@fb.com>
In-Reply-To: <20200427201235.2994549-1-yhs@fb.com>
References: <20200427201235.2994549-1-yhs@fb.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain
Sender: netdev-owner@vger.kernel.org
Precedence: bulk

Series

bpf: implement bpf iterator for kernel data | expand

Commit Message

Yonghong Song April 27, 2020, 8:12 p.m. UTC

Three new libbpf APIs are added to support bpf_iter:
  - bpf_program__attach_iter
    Given a bpf program and additional parameters, which is
    none now, returns a bpf_link.
  - bpf_link__create_iter
    Given a bpf_link, create a bpf_iter and return a fd
    so user can then do read() to get seq_file output data.
  - bpf_iter_create
    syscall level API to create a bpf iterator.

Two macros, BPF_SEQ_PRINTF0 and BPF_SEQ_PRINTF, are also introduced.
These two macros can help bpf program writers with
nicer bpf_seq_printf syntax similar to the kernel one.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 tools/lib/bpf/bpf.c         | 11 +++++++
 tools/lib/bpf/bpf.h         |  2 ++
 tools/lib/bpf/bpf_tracing.h | 23 ++++++++++++++
 tools/lib/bpf/libbpf.c      | 60 +++++++++++++++++++++++++++++++++++++
 tools/lib/bpf/libbpf.h      | 11 +++++++
 tools/lib/bpf/libbpf.map    |  7 +++++
 6 files changed, 114 insertions(+)

Comments

Andrii Nakryiko April 30, 2020, 1:41 a.m. UTC | #1

On Mon, Apr 27, 2020 at 1:17 PM Yonghong Song <yhs@fb.com> wrote:
>
> Three new libbpf APIs are added to support bpf_iter:
>   - bpf_program__attach_iter
>     Given a bpf program and additional parameters, which is
>     none now, returns a bpf_link.
>   - bpf_link__create_iter
>     Given a bpf_link, create a bpf_iter and return a fd
>     so user can then do read() to get seq_file output data.
>   - bpf_iter_create
>     syscall level API to create a bpf iterator.
>
> Two macros, BPF_SEQ_PRINTF0 and BPF_SEQ_PRINTF, are also introduced.
> These two macros can help bpf program writers with
> nicer bpf_seq_printf syntax similar to the kernel one.
>
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  tools/lib/bpf/bpf.c         | 11 +++++++
>  tools/lib/bpf/bpf.h         |  2 ++
>  tools/lib/bpf/bpf_tracing.h | 23 ++++++++++++++
>  tools/lib/bpf/libbpf.c      | 60 +++++++++++++++++++++++++++++++++++++
>  tools/lib/bpf/libbpf.h      | 11 +++++++
>  tools/lib/bpf/libbpf.map    |  7 +++++
>  6 files changed, 114 insertions(+)
>
> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> index 5cc1b0785d18..7ffd6c0ad95f 100644
> --- a/tools/lib/bpf/bpf.c
> +++ b/tools/lib/bpf/bpf.c
> @@ -619,6 +619,17 @@ int bpf_link_update(int link_fd, int new_prog_fd,
>         return sys_bpf(BPF_LINK_UPDATE, &attr, sizeof(attr));
>  }
>
> +int bpf_iter_create(int link_fd, unsigned int flags)

Do you envision anything more than just flags being passed for
bpf_iter_create? I wonder if we should just go ahead with options
struct here?

> +{
> +       union bpf_attr attr;
> +
> +       memset(&attr, 0, sizeof(attr));
> +       attr.iter_create.link_fd = link_fd;
> +       attr.iter_create.flags = flags;
> +
> +       return sys_bpf(BPF_ITER_CREATE, &attr, sizeof(attr));
> +}
> +

[...]

> +/*
> + * BPF_SEQ_PRINTF to wrap bpf_seq_printf to-be-printed values
> + * in a structure. BPF_SEQ_PRINTF0 is a simple wrapper for
> + * bpf_seq_printf().
> + */
> +#define BPF_SEQ_PRINTF0(seq, fmt)                                      \
> +       ({                                                              \
> +               int ret = bpf_seq_printf(seq, fmt, sizeof(fmt),         \
> +                                        (void *)0, 0);                 \
> +               ret;                                                    \
> +       })
> +
> +#define BPF_SEQ_PRINTF(seq, fmt, args...)                              \

You can unify BPF_SEQ_PRINTF and BPF_SEQ_PRINTF0 by using
___bpf_empty() macro. See bpf_core_read.h for similar use case.
Specifically, look at ___empty (equivalent of ___bpf_empty) and
___core_read, ___core_read0, ___core_readN macro.

> +       ({                                                              \
> +               _Pragma("GCC diagnostic push")                          \
> +               _Pragma("GCC diagnostic ignored \"-Wint-conversion\"")  \
> +               __u64 param[___bpf_narg(args)] = { args };              \

Do you need to provide the size of array here? If you omit
__bpf_narg(args), wouldn't compiler automatically calculate the right
size?

Also, can you please use "unsigned long long" to not have any implicit
dependency on __u64 being defined?

> +               _Pragma("GCC diagnostic pop")                           \
> +               int ret = bpf_seq_printf(seq, fmt, sizeof(fmt),         \
> +                                        param, sizeof(param));         \
> +               ret;                                                    \
> +       })
> +
>  #endif
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 8e1dc6980fac..ffdc4d8e0cc0 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -6366,6 +6366,9 @@ static const struct bpf_sec_def section_defs[] = {
>                 .is_attach_btf = true,
>                 .expected_attach_type = BPF_LSM_MAC,
>                 .attach_fn = attach_lsm),
> +       SEC_DEF("iter/", TRACING,
> +               .expected_attach_type = BPF_TRACE_ITER,
> +               .is_attach_btf = true),

It would be nice to implement auto-attach capabilities (similar to
fentry/fexit, lsm and raw_tracepoint). Section name should have enough
information for this, no?

>         BPF_PROG_SEC("xdp",                     BPF_PROG_TYPE_XDP),
>         BPF_PROG_SEC("perf_event",              BPF_PROG_TYPE_PERF_EVENT),
>         BPF_PROG_SEC("lwt_in",                  BPF_PROG_TYPE_LWT_IN),
> @@ -6629,6 +6632,7 @@ static int bpf_object__collect_struct_ops_map_reloc(struct bpf_object *obj,
>

[...]

> +
> +       link = calloc(1, sizeof(*link));
> +       if (!link)
> +               return ERR_PTR(-ENOMEM);
> +       link->detach = &bpf_link__detach_fd;
> +
> +       attach_type = bpf_program__get_expected_attach_type(prog);

Given you know it has to be BPF_TRACE_ITER, it's better to explicitly
specify that. If provided program wasn't loaded with correct
expected_attach_type, kernel will reject it. But if you don't do it,
then you can accidentally create some other type of bpf_link.

> +       link_fd = bpf_link_create(prog_fd, 0, attach_type, NULL);
> +       if (link_fd < 0) {
> +               link_fd = -errno;
> +               free(link);
> +               pr_warn("program '%s': failed to attach to iterator: %s\n",
> +                       bpf_program__title(prog, false),
> +                       libbpf_strerror_r(link_fd, errmsg, sizeof(errmsg)));
> +               return ERR_PTR(link_fd);
> +       }
> +       link->fd = link_fd;
> +       return link;
> +}
> +
> +int bpf_link__create_iter(struct bpf_link *link, unsigned int flags)
> +{

Same question as for low-level bpf_link_create(). If we expect the
need to extend optional things in the future, I'd add opts right now.

But I wonder if bpf_link__create_iter() provides any additional value
beyond bpf_iter_create(). Maybe let's not add it (yet)?

> +       char errmsg[STRERR_BUFSIZE];
> +       int iter_fd;
> +
> +       iter_fd = bpf_iter_create(bpf_link__fd(link), flags);
> +       if (iter_fd < 0) {
> +               iter_fd = -errno;
> +               pr_warn("failed to create an iterator: %s\n",
> +                       libbpf_strerror_r(iter_fd, errmsg, sizeof(errmsg)));
> +       }
> +
> +       return iter_fd;
> +}
> +

[...]

Yonghong Song May 2, 2020, 7:17 a.m. UTC | #2

On 4/29/20 6:41 PM, Andrii Nakryiko wrote:
> On Mon, Apr 27, 2020 at 1:17 PM Yonghong Song <yhs@fb.com> wrote:
>>
>> Three new libbpf APIs are added to support bpf_iter:
>>    - bpf_program__attach_iter
>>      Given a bpf program and additional parameters, which is
>>      none now, returns a bpf_link.
>>    - bpf_link__create_iter
>>      Given a bpf_link, create a bpf_iter and return a fd
>>      so user can then do read() to get seq_file output data.
>>    - bpf_iter_create
>>      syscall level API to create a bpf iterator.
>>
>> Two macros, BPF_SEQ_PRINTF0 and BPF_SEQ_PRINTF, are also introduced.
>> These two macros can help bpf program writers with
>> nicer bpf_seq_printf syntax similar to the kernel one.
>>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>>   tools/lib/bpf/bpf.c         | 11 +++++++
>>   tools/lib/bpf/bpf.h         |  2 ++
>>   tools/lib/bpf/bpf_tracing.h | 23 ++++++++++++++
>>   tools/lib/bpf/libbpf.c      | 60 +++++++++++++++++++++++++++++++++++++
>>   tools/lib/bpf/libbpf.h      | 11 +++++++
>>   tools/lib/bpf/libbpf.map    |  7 +++++
>>   6 files changed, 114 insertions(+)
>>
>> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
>> index 5cc1b0785d18..7ffd6c0ad95f 100644
>> --- a/tools/lib/bpf/bpf.c
>> +++ b/tools/lib/bpf/bpf.c
>> @@ -619,6 +619,17 @@ int bpf_link_update(int link_fd, int new_prog_fd,
>>          return sys_bpf(BPF_LINK_UPDATE, &attr, sizeof(attr));
>>   }
>>
>> +int bpf_iter_create(int link_fd, unsigned int flags)
> 
> Do you envision anything more than just flags being passed for
> bpf_iter_create? I wonder if we should just go ahead with options
> struct here?

I think most, if not all, parameters should go to link create.
This way, we can have the identical anon_iter through:
    link -> anon_iter
    link -> pinned file -> anon_iter

I do not really expect any more fields for bpf_iter_create.
The flags here is for potential future extension, which I
have no idea how it looks like.

> 
>> +{
>> +       union bpf_attr attr;
>> +
>> +       memset(&attr, 0, sizeof(attr));
>> +       attr.iter_create.link_fd = link_fd;
>> +       attr.iter_create.flags = flags;
>> +
>> +       return sys_bpf(BPF_ITER_CREATE, &attr, sizeof(attr));
>> +}
>> +
> 
> [...]
> 
>> +/*
>> + * BPF_SEQ_PRINTF to wrap bpf_seq_printf to-be-printed values
>> + * in a structure. BPF_SEQ_PRINTF0 is a simple wrapper for
>> + * bpf_seq_printf().
>> + */
>> +#define BPF_SEQ_PRINTF0(seq, fmt)                                      \
>> +       ({                                                              \
>> +               int ret = bpf_seq_printf(seq, fmt, sizeof(fmt),         \
>> +                                        (void *)0, 0);                 \
>> +               ret;                                                    \
>> +       })
>> +
>> +#define BPF_SEQ_PRINTF(seq, fmt, args...)                              \
> 
> You can unify BPF_SEQ_PRINTF and BPF_SEQ_PRINTF0 by using
> ___bpf_empty() macro. See bpf_core_read.h for similar use case.
> Specifically, look at ___empty (equivalent of ___bpf_empty) and
> ___core_read, ___core_read0, ___core_readN macro.

Thanks for the tip. Will try.

> 
>> +       ({                                                              \
>> +               _Pragma("GCC diagnostic push")                          \
>> +               _Pragma("GCC diagnostic ignored \"-Wint-conversion\"")  \
>> +               __u64 param[___bpf_narg(args)] = { args };              \
> 
> Do you need to provide the size of array here? If you omit
> __bpf_narg(args), wouldn't compiler automatically calculate the right
> size?
> 

Yes, compiler should calculate correct size.

> Also, can you please use "unsigned long long" to not have any implicit
> dependency on __u64 being defined?

Will do.

> 
>> +               _Pragma("GCC diagnostic pop")                           \
>> +               int ret = bpf_seq_printf(seq, fmt, sizeof(fmt),         \
>> +                                        param, sizeof(param));         \
>> +               ret;                                                    \
>> +       })
>> +
>>   #endif
>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
>> index 8e1dc6980fac..ffdc4d8e0cc0 100644
>> --- a/tools/lib/bpf/libbpf.c
>> +++ b/tools/lib/bpf/libbpf.c
>> @@ -6366,6 +6366,9 @@ static const struct bpf_sec_def section_defs[] = {
>>                  .is_attach_btf = true,
>>                  .expected_attach_type = BPF_LSM_MAC,
>>                  .attach_fn = attach_lsm),
>> +       SEC_DEF("iter/", TRACING,
>> +               .expected_attach_type = BPF_TRACE_ITER,
>> +               .is_attach_btf = true),
> 
> It would be nice to implement auto-attach capabilities (similar to
> fentry/fexit, lsm and raw_tracepoint). Section name should have enough
> information for this, no?

In the current form, yes, auto attach is possible.
But I am thinking we may soon have additional information
like map_id (appear in link_create) etc.
to make auto attach not possible. That is why
I implemented an explicit attach. is this assessment correct?

> 
>>          BPF_PROG_SEC("xdp",                     BPF_PROG_TYPE_XDP),
>>          BPF_PROG_SEC("perf_event",              BPF_PROG_TYPE_PERF_EVENT),
>>          BPF_PROG_SEC("lwt_in",                  BPF_PROG_TYPE_LWT_IN),
>> @@ -6629,6 +6632,7 @@ static int bpf_object__collect_struct_ops_map_reloc(struct bpf_object *obj,
>>
> 
> [...]
> 
>> +
>> +       link = calloc(1, sizeof(*link));
>> +       if (!link)
>> +               return ERR_PTR(-ENOMEM);
>> +       link->detach = &bpf_link__detach_fd;
>> +
>> +       attach_type = bpf_program__get_expected_attach_type(prog);
> 
> Given you know it has to be BPF_TRACE_ITER, it's better to explicitly
> specify that. If provided program wasn't loaded with correct
> expected_attach_type, kernel will reject it. But if you don't do it,
> then you can accidentally create some other type of bpf_link.

Yes, will do.

> 
>> +       link_fd = bpf_link_create(prog_fd, 0, attach_type, NULL);
>> +       if (link_fd < 0) {
>> +               link_fd = -errno;
>> +               free(link);
>> +               pr_warn("program '%s': failed to attach to iterator: %s\n",
>> +                       bpf_program__title(prog, false),
>> +                       libbpf_strerror_r(link_fd, errmsg, sizeof(errmsg)));
>> +               return ERR_PTR(link_fd);
>> +       }
>> +       link->fd = link_fd;
>> +       return link;
>> +}
>> +
>> +int bpf_link__create_iter(struct bpf_link *link, unsigned int flags)
>> +{
> 
> Same question as for low-level bpf_link_create(). If we expect the
> need to extend optional things in the future, I'd add opts right now.
> 
> But I wonder if bpf_link__create_iter() provides any additional value
> beyond bpf_iter_create(). Maybe let's not add it (yet)?

The only additional thing is better warning messsage.
Agree that is so marginal. Will drop it.

> 
>> +       char errmsg[STRERR_BUFSIZE];
>> +       int iter_fd;
>> +
>> +       iter_fd = bpf_iter_create(bpf_link__fd(link), flags);
>> +       if (iter_fd < 0) {
>> +               iter_fd = -errno;
>> +               pr_warn("failed to create an iterator: %s\n",
>> +                       libbpf_strerror_r(iter_fd, errmsg, sizeof(errmsg)));
>> +       }
>> +
>> +       return iter_fd;
>> +}
>> +
> 
> [...]
>

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 5cc1b0785d18..7ffd6c0ad95f 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -619,6 +619,17 @@  int bpf_link_update(int link_fd, int new_prog_fd,
 	return sys_bpf(BPF_LINK_UPDATE, &attr, sizeof(attr));
 }
 
+int bpf_iter_create(int link_fd, unsigned int flags)
+{
+	union bpf_attr attr;
+
+	memset(&attr, 0, sizeof(attr));
+	attr.iter_create.link_fd = link_fd;
+	attr.iter_create.flags = flags;
+
+	return sys_bpf(BPF_ITER_CREATE, &attr, sizeof(attr));
+}
+
 int bpf_prog_query(int target_fd, enum bpf_attach_type type, __u32 query_flags,
 		   __u32 *attach_flags, __u32 *prog_ids, __u32 *prog_cnt)
 {
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 46d47afdd887..db9df303090e 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -187,6 +187,8 @@  struct bpf_link_update_opts {
 LIBBPF_API int bpf_link_update(int link_fd, int new_prog_fd,
 			       const struct bpf_link_update_opts *opts);
 
+LIBBPF_API int bpf_iter_create(int link_fd, unsigned int flags);
+
 struct bpf_prog_test_run_attr {
 	int prog_fd;
 	int repeat;
diff --git a/tools/lib/bpf/bpf_tracing.h b/tools/lib/bpf/bpf_tracing.h
index f3f3c3fb98cb..4a6dffaa7e57 100644
--- a/tools/lib/bpf/bpf_tracing.h
+++ b/tools/lib/bpf/bpf_tracing.h
@@ -413,4 +413,27 @@  typeof(name(0)) name(struct pt_regs *ctx)				    \
 }									    \
 static __always_inline typeof(name(0)) ____##name(struct pt_regs *ctx, ##args)
 
+/*
+ * BPF_SEQ_PRINTF to wrap bpf_seq_printf to-be-printed values
+ * in a structure. BPF_SEQ_PRINTF0 is a simple wrapper for
+ * bpf_seq_printf().
+ */
+#define BPF_SEQ_PRINTF0(seq, fmt)					\
+	({								\
+		int ret = bpf_seq_printf(seq, fmt, sizeof(fmt),		\
+					 (void *)0, 0);			\
+		ret;							\
+	})
+
+#define BPF_SEQ_PRINTF(seq, fmt, args...)				\
+	({								\
+		_Pragma("GCC diagnostic push")				\
+		_Pragma("GCC diagnostic ignored \"-Wint-conversion\"")	\
+		__u64 param[___bpf_narg(args)] = { args };		\
+		_Pragma("GCC diagnostic pop")				\
+		int ret = bpf_seq_printf(seq, fmt, sizeof(fmt),		\
+					 param, sizeof(param));		\
+		ret;							\
+	})
+
 #endif
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 8e1dc6980fac..ffdc4d8e0cc0 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -6366,6 +6366,9 @@  static const struct bpf_sec_def section_defs[] = {
 		.is_attach_btf = true,
 		.expected_attach_type = BPF_LSM_MAC,
 		.attach_fn = attach_lsm),
+	SEC_DEF("iter/", TRACING,
+		.expected_attach_type = BPF_TRACE_ITER,
+		.is_attach_btf = true),
 	BPF_PROG_SEC("xdp",			BPF_PROG_TYPE_XDP),
 	BPF_PROG_SEC("perf_event",		BPF_PROG_TYPE_PERF_EVENT),
 	BPF_PROG_SEC("lwt_in",			BPF_PROG_TYPE_LWT_IN),
@@ -6629,6 +6632,7 @@  static int bpf_object__collect_struct_ops_map_reloc(struct bpf_object *obj,
 
 #define BTF_TRACE_PREFIX "btf_trace_"
 #define BTF_LSM_PREFIX "bpf_lsm_"
+#define BTF_ITER_PREFIX "__bpf_iter__"
 #define BTF_MAX_NAME_SIZE 128
 
 static int find_btf_by_prefix_kind(const struct btf *btf, const char *prefix,
@@ -6659,6 +6663,9 @@  static inline int __find_vmlinux_btf_id(struct btf *btf, const char *name,
 	else if (attach_type == BPF_LSM_MAC)
 		err = find_btf_by_prefix_kind(btf, BTF_LSM_PREFIX, name,
 					      BTF_KIND_FUNC);
+	else if (attach_type == BPF_TRACE_ITER)
+		err = find_btf_by_prefix_kind(btf, BTF_ITER_PREFIX, name,
+					      BTF_KIND_FUNC);
 	else
 		err = btf__find_by_name_kind(btf, name, BTF_KIND_FUNC);
 
@@ -7617,6 +7624,59 @@  bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd)
 	return link;
 }
 
+struct bpf_link *
+bpf_program__attach_iter(struct bpf_program *prog,
+			 const struct bpf_iter_attach_opts *opts)
+{
+	enum bpf_attach_type attach_type;
+	char errmsg[STRERR_BUFSIZE];
+	struct bpf_link *link;
+	int prog_fd, link_fd;
+
+	if (!OPTS_VALID(opts, bpf_iter_attach_opts))
+		return ERR_PTR(-EINVAL);
+
+	prog_fd = bpf_program__fd(prog);
+	if (prog_fd < 0) {
+		pr_warn("program '%s': can't attach before loaded\n",
+			bpf_program__title(prog, false));
+		return ERR_PTR(-EINVAL);
+	}
+
+	link = calloc(1, sizeof(*link));
+	if (!link)
+		return ERR_PTR(-ENOMEM);
+	link->detach = &bpf_link__detach_fd;
+
+	attach_type = bpf_program__get_expected_attach_type(prog);
+	link_fd = bpf_link_create(prog_fd, 0, attach_type, NULL);
+	if (link_fd < 0) {
+		link_fd = -errno;
+		free(link);
+		pr_warn("program '%s': failed to attach to iterator: %s\n",
+			bpf_program__title(prog, false),
+			libbpf_strerror_r(link_fd, errmsg, sizeof(errmsg)));
+		return ERR_PTR(link_fd);
+	}
+	link->fd = link_fd;
+	return link;
+}
+
+int bpf_link__create_iter(struct bpf_link *link, unsigned int flags)
+{
+	char errmsg[STRERR_BUFSIZE];
+	int iter_fd;
+
+	iter_fd = bpf_iter_create(bpf_link__fd(link), flags);
+	if (iter_fd < 0) {
+		iter_fd = -errno;
+		pr_warn("failed to create an iterator: %s\n",
+			libbpf_strerror_r(iter_fd, errmsg, sizeof(errmsg)));
+	}
+
+	return iter_fd;
+}
+
 struct bpf_link *bpf_program__attach(struct bpf_program *prog)
 {
 	const struct bpf_sec_def *sec_def;
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index f1dacecb1619..abe5786fcab3 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -258,6 +258,17 @@  struct bpf_map;
 
 LIBBPF_API struct bpf_link *bpf_map__attach_struct_ops(struct bpf_map *map);
 
+struct bpf_iter_attach_opts {
+	size_t sz; /* size of this struct for forward/backward compatibility */
+};
+#define bpf_iter_attach_opts__last_field sz
+
+LIBBPF_API struct bpf_link *
+bpf_program__attach_iter(struct bpf_program *prog,
+			 const struct bpf_iter_attach_opts *opts);
+LIBBPF_API int
+bpf_link__create_iter(struct bpf_link *link, unsigned int flags);
+
 struct bpf_insn;
 
 /*
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index bb8831605b25..1cea36f9f2e2 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -254,3 +254,10 @@  LIBBPF_0.0.8 {
 		bpf_program__set_lsm;
 		bpf_set_link_xdp_fd_opts;
 } LIBBPF_0.0.7;
+
+LIBBPF_0.0.9 {
+	global:
+		bpf_link__create_iter;
+		bpf_program__attach_iter;
+		bpf_iter_create;
+} LIBBPF_0.0.8;

[bpf-next,v1,15/19] tools/libbpf: add bpf_iter support

Commit Message

Comments

Patch