Message ID | 20200427201252.2996037-1-yhs@fb.com |
---|---|
State | Changes Requested |
Delegated to: | BPF Maintainers |
Headers | show |
Series | bpf: implement bpf iterator for kernel data | expand |
On Mon, Apr 27, 2020 at 1:17 PM Yonghong Song <yhs@fb.com> wrote: > > Three new libbpf APIs are added to support bpf_iter: > - bpf_program__attach_iter > Given a bpf program and additional parameters, which is > none now, returns a bpf_link. > - bpf_link__create_iter > Given a bpf_link, create a bpf_iter and return a fd > so user can then do read() to get seq_file output data. > - bpf_iter_create > syscall level API to create a bpf iterator. > > Two macros, BPF_SEQ_PRINTF0 and BPF_SEQ_PRINTF, are also introduced. > These two macros can help bpf program writers with > nicer bpf_seq_printf syntax similar to the kernel one. > > Signed-off-by: Yonghong Song <yhs@fb.com> > --- > tools/lib/bpf/bpf.c | 11 +++++++ > tools/lib/bpf/bpf.h | 2 ++ > tools/lib/bpf/bpf_tracing.h | 23 ++++++++++++++ > tools/lib/bpf/libbpf.c | 60 +++++++++++++++++++++++++++++++++++++ > tools/lib/bpf/libbpf.h | 11 +++++++ > tools/lib/bpf/libbpf.map | 7 +++++ > 6 files changed, 114 insertions(+) > > diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c > index 5cc1b0785d18..7ffd6c0ad95f 100644 > --- a/tools/lib/bpf/bpf.c > +++ b/tools/lib/bpf/bpf.c > @@ -619,6 +619,17 @@ int bpf_link_update(int link_fd, int new_prog_fd, > return sys_bpf(BPF_LINK_UPDATE, &attr, sizeof(attr)); > } > > +int bpf_iter_create(int link_fd, unsigned int flags) Do you envision anything more than just flags being passed for bpf_iter_create? I wonder if we should just go ahead with options struct here? > +{ > + union bpf_attr attr; > + > + memset(&attr, 0, sizeof(attr)); > + attr.iter_create.link_fd = link_fd; > + attr.iter_create.flags = flags; > + > + return sys_bpf(BPF_ITER_CREATE, &attr, sizeof(attr)); > +} > + [...] > +/* > + * BPF_SEQ_PRINTF to wrap bpf_seq_printf to-be-printed values > + * in a structure. BPF_SEQ_PRINTF0 is a simple wrapper for > + * bpf_seq_printf(). > + */ > +#define BPF_SEQ_PRINTF0(seq, fmt) \ > + ({ \ > + int ret = bpf_seq_printf(seq, fmt, sizeof(fmt), \ > + (void *)0, 0); \ > + ret; \ > + }) > + > +#define BPF_SEQ_PRINTF(seq, fmt, args...) \ You can unify BPF_SEQ_PRINTF and BPF_SEQ_PRINTF0 by using ___bpf_empty() macro. See bpf_core_read.h for similar use case. Specifically, look at ___empty (equivalent of ___bpf_empty) and ___core_read, ___core_read0, ___core_readN macro. > + ({ \ > + _Pragma("GCC diagnostic push") \ > + _Pragma("GCC diagnostic ignored \"-Wint-conversion\"") \ > + __u64 param[___bpf_narg(args)] = { args }; \ Do you need to provide the size of array here? If you omit __bpf_narg(args), wouldn't compiler automatically calculate the right size? Also, can you please use "unsigned long long" to not have any implicit dependency on __u64 being defined? > + _Pragma("GCC diagnostic pop") \ > + int ret = bpf_seq_printf(seq, fmt, sizeof(fmt), \ > + param, sizeof(param)); \ > + ret; \ > + }) > + > #endif > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c > index 8e1dc6980fac..ffdc4d8e0cc0 100644 > --- a/tools/lib/bpf/libbpf.c > +++ b/tools/lib/bpf/libbpf.c > @@ -6366,6 +6366,9 @@ static const struct bpf_sec_def section_defs[] = { > .is_attach_btf = true, > .expected_attach_type = BPF_LSM_MAC, > .attach_fn = attach_lsm), > + SEC_DEF("iter/", TRACING, > + .expected_attach_type = BPF_TRACE_ITER, > + .is_attach_btf = true), It would be nice to implement auto-attach capabilities (similar to fentry/fexit, lsm and raw_tracepoint). Section name should have enough information for this, no? > BPF_PROG_SEC("xdp", BPF_PROG_TYPE_XDP), > BPF_PROG_SEC("perf_event", BPF_PROG_TYPE_PERF_EVENT), > BPF_PROG_SEC("lwt_in", BPF_PROG_TYPE_LWT_IN), > @@ -6629,6 +6632,7 @@ static int bpf_object__collect_struct_ops_map_reloc(struct bpf_object *obj, > [...] > + > + link = calloc(1, sizeof(*link)); > + if (!link) > + return ERR_PTR(-ENOMEM); > + link->detach = &bpf_link__detach_fd; > + > + attach_type = bpf_program__get_expected_attach_type(prog); Given you know it has to be BPF_TRACE_ITER, it's better to explicitly specify that. If provided program wasn't loaded with correct expected_attach_type, kernel will reject it. But if you don't do it, then you can accidentally create some other type of bpf_link. > + link_fd = bpf_link_create(prog_fd, 0, attach_type, NULL); > + if (link_fd < 0) { > + link_fd = -errno; > + free(link); > + pr_warn("program '%s': failed to attach to iterator: %s\n", > + bpf_program__title(prog, false), > + libbpf_strerror_r(link_fd, errmsg, sizeof(errmsg))); > + return ERR_PTR(link_fd); > + } > + link->fd = link_fd; > + return link; > +} > + > +int bpf_link__create_iter(struct bpf_link *link, unsigned int flags) > +{ Same question as for low-level bpf_link_create(). If we expect the need to extend optional things in the future, I'd add opts right now. But I wonder if bpf_link__create_iter() provides any additional value beyond bpf_iter_create(). Maybe let's not add it (yet)? > + char errmsg[STRERR_BUFSIZE]; > + int iter_fd; > + > + iter_fd = bpf_iter_create(bpf_link__fd(link), flags); > + if (iter_fd < 0) { > + iter_fd = -errno; > + pr_warn("failed to create an iterator: %s\n", > + libbpf_strerror_r(iter_fd, errmsg, sizeof(errmsg))); > + } > + > + return iter_fd; > +} > + [...]
On 4/29/20 6:41 PM, Andrii Nakryiko wrote: > On Mon, Apr 27, 2020 at 1:17 PM Yonghong Song <yhs@fb.com> wrote: >> >> Three new libbpf APIs are added to support bpf_iter: >> - bpf_program__attach_iter >> Given a bpf program and additional parameters, which is >> none now, returns a bpf_link. >> - bpf_link__create_iter >> Given a bpf_link, create a bpf_iter and return a fd >> so user can then do read() to get seq_file output data. >> - bpf_iter_create >> syscall level API to create a bpf iterator. >> >> Two macros, BPF_SEQ_PRINTF0 and BPF_SEQ_PRINTF, are also introduced. >> These two macros can help bpf program writers with >> nicer bpf_seq_printf syntax similar to the kernel one. >> >> Signed-off-by: Yonghong Song <yhs@fb.com> >> --- >> tools/lib/bpf/bpf.c | 11 +++++++ >> tools/lib/bpf/bpf.h | 2 ++ >> tools/lib/bpf/bpf_tracing.h | 23 ++++++++++++++ >> tools/lib/bpf/libbpf.c | 60 +++++++++++++++++++++++++++++++++++++ >> tools/lib/bpf/libbpf.h | 11 +++++++ >> tools/lib/bpf/libbpf.map | 7 +++++ >> 6 files changed, 114 insertions(+) >> >> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c >> index 5cc1b0785d18..7ffd6c0ad95f 100644 >> --- a/tools/lib/bpf/bpf.c >> +++ b/tools/lib/bpf/bpf.c >> @@ -619,6 +619,17 @@ int bpf_link_update(int link_fd, int new_prog_fd, >> return sys_bpf(BPF_LINK_UPDATE, &attr, sizeof(attr)); >> } >> >> +int bpf_iter_create(int link_fd, unsigned int flags) > > Do you envision anything more than just flags being passed for > bpf_iter_create? I wonder if we should just go ahead with options > struct here? I think most, if not all, parameters should go to link create. This way, we can have the identical anon_iter through: link -> anon_iter link -> pinned file -> anon_iter I do not really expect any more fields for bpf_iter_create. The flags here is for potential future extension, which I have no idea how it looks like. > >> +{ >> + union bpf_attr attr; >> + >> + memset(&attr, 0, sizeof(attr)); >> + attr.iter_create.link_fd = link_fd; >> + attr.iter_create.flags = flags; >> + >> + return sys_bpf(BPF_ITER_CREATE, &attr, sizeof(attr)); >> +} >> + > > [...] > >> +/* >> + * BPF_SEQ_PRINTF to wrap bpf_seq_printf to-be-printed values >> + * in a structure. BPF_SEQ_PRINTF0 is a simple wrapper for >> + * bpf_seq_printf(). >> + */ >> +#define BPF_SEQ_PRINTF0(seq, fmt) \ >> + ({ \ >> + int ret = bpf_seq_printf(seq, fmt, sizeof(fmt), \ >> + (void *)0, 0); \ >> + ret; \ >> + }) >> + >> +#define BPF_SEQ_PRINTF(seq, fmt, args...) \ > > You can unify BPF_SEQ_PRINTF and BPF_SEQ_PRINTF0 by using > ___bpf_empty() macro. See bpf_core_read.h for similar use case. > Specifically, look at ___empty (equivalent of ___bpf_empty) and > ___core_read, ___core_read0, ___core_readN macro. Thanks for the tip. Will try. > >> + ({ \ >> + _Pragma("GCC diagnostic push") \ >> + _Pragma("GCC diagnostic ignored \"-Wint-conversion\"") \ >> + __u64 param[___bpf_narg(args)] = { args }; \ > > Do you need to provide the size of array here? If you omit > __bpf_narg(args), wouldn't compiler automatically calculate the right > size? > Yes, compiler should calculate correct size. > Also, can you please use "unsigned long long" to not have any implicit > dependency on __u64 being defined? Will do. > >> + _Pragma("GCC diagnostic pop") \ >> + int ret = bpf_seq_printf(seq, fmt, sizeof(fmt), \ >> + param, sizeof(param)); \ >> + ret; \ >> + }) >> + >> #endif >> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c >> index 8e1dc6980fac..ffdc4d8e0cc0 100644 >> --- a/tools/lib/bpf/libbpf.c >> +++ b/tools/lib/bpf/libbpf.c >> @@ -6366,6 +6366,9 @@ static const struct bpf_sec_def section_defs[] = { >> .is_attach_btf = true, >> .expected_attach_type = BPF_LSM_MAC, >> .attach_fn = attach_lsm), >> + SEC_DEF("iter/", TRACING, >> + .expected_attach_type = BPF_TRACE_ITER, >> + .is_attach_btf = true), > > It would be nice to implement auto-attach capabilities (similar to > fentry/fexit, lsm and raw_tracepoint). Section name should have enough > information for this, no? In the current form, yes, auto attach is possible. But I am thinking we may soon have additional information like map_id (appear in link_create) etc. to make auto attach not possible. That is why I implemented an explicit attach. is this assessment correct? > >> BPF_PROG_SEC("xdp", BPF_PROG_TYPE_XDP), >> BPF_PROG_SEC("perf_event", BPF_PROG_TYPE_PERF_EVENT), >> BPF_PROG_SEC("lwt_in", BPF_PROG_TYPE_LWT_IN), >> @@ -6629,6 +6632,7 @@ static int bpf_object__collect_struct_ops_map_reloc(struct bpf_object *obj, >> > > [...] > >> + >> + link = calloc(1, sizeof(*link)); >> + if (!link) >> + return ERR_PTR(-ENOMEM); >> + link->detach = &bpf_link__detach_fd; >> + >> + attach_type = bpf_program__get_expected_attach_type(prog); > > Given you know it has to be BPF_TRACE_ITER, it's better to explicitly > specify that. If provided program wasn't loaded with correct > expected_attach_type, kernel will reject it. But if you don't do it, > then you can accidentally create some other type of bpf_link. Yes, will do. > >> + link_fd = bpf_link_create(prog_fd, 0, attach_type, NULL); >> + if (link_fd < 0) { >> + link_fd = -errno; >> + free(link); >> + pr_warn("program '%s': failed to attach to iterator: %s\n", >> + bpf_program__title(prog, false), >> + libbpf_strerror_r(link_fd, errmsg, sizeof(errmsg))); >> + return ERR_PTR(link_fd); >> + } >> + link->fd = link_fd; >> + return link; >> +} >> + >> +int bpf_link__create_iter(struct bpf_link *link, unsigned int flags) >> +{ > > Same question as for low-level bpf_link_create(). If we expect the > need to extend optional things in the future, I'd add opts right now. > > But I wonder if bpf_link__create_iter() provides any additional value > beyond bpf_iter_create(). Maybe let's not add it (yet)? The only additional thing is better warning messsage. Agree that is so marginal. Will drop it. > >> + char errmsg[STRERR_BUFSIZE]; >> + int iter_fd; >> + >> + iter_fd = bpf_iter_create(bpf_link__fd(link), flags); >> + if (iter_fd < 0) { >> + iter_fd = -errno; >> + pr_warn("failed to create an iterator: %s\n", >> + libbpf_strerror_r(iter_fd, errmsg, sizeof(errmsg))); >> + } >> + >> + return iter_fd; >> +} >> + > > [...] >
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c index 5cc1b0785d18..7ffd6c0ad95f 100644 --- a/tools/lib/bpf/bpf.c +++ b/tools/lib/bpf/bpf.c @@ -619,6 +619,17 @@ int bpf_link_update(int link_fd, int new_prog_fd, return sys_bpf(BPF_LINK_UPDATE, &attr, sizeof(attr)); } +int bpf_iter_create(int link_fd, unsigned int flags) +{ + union bpf_attr attr; + + memset(&attr, 0, sizeof(attr)); + attr.iter_create.link_fd = link_fd; + attr.iter_create.flags = flags; + + return sys_bpf(BPF_ITER_CREATE, &attr, sizeof(attr)); +} + int bpf_prog_query(int target_fd, enum bpf_attach_type type, __u32 query_flags, __u32 *attach_flags, __u32 *prog_ids, __u32 *prog_cnt) { diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h index 46d47afdd887..db9df303090e 100644 --- a/tools/lib/bpf/bpf.h +++ b/tools/lib/bpf/bpf.h @@ -187,6 +187,8 @@ struct bpf_link_update_opts { LIBBPF_API int bpf_link_update(int link_fd, int new_prog_fd, const struct bpf_link_update_opts *opts); +LIBBPF_API int bpf_iter_create(int link_fd, unsigned int flags); + struct bpf_prog_test_run_attr { int prog_fd; int repeat; diff --git a/tools/lib/bpf/bpf_tracing.h b/tools/lib/bpf/bpf_tracing.h index f3f3c3fb98cb..4a6dffaa7e57 100644 --- a/tools/lib/bpf/bpf_tracing.h +++ b/tools/lib/bpf/bpf_tracing.h @@ -413,4 +413,27 @@ typeof(name(0)) name(struct pt_regs *ctx) \ } \ static __always_inline typeof(name(0)) ____##name(struct pt_regs *ctx, ##args) +/* + * BPF_SEQ_PRINTF to wrap bpf_seq_printf to-be-printed values + * in a structure. BPF_SEQ_PRINTF0 is a simple wrapper for + * bpf_seq_printf(). + */ +#define BPF_SEQ_PRINTF0(seq, fmt) \ + ({ \ + int ret = bpf_seq_printf(seq, fmt, sizeof(fmt), \ + (void *)0, 0); \ + ret; \ + }) + +#define BPF_SEQ_PRINTF(seq, fmt, args...) \ + ({ \ + _Pragma("GCC diagnostic push") \ + _Pragma("GCC diagnostic ignored \"-Wint-conversion\"") \ + __u64 param[___bpf_narg(args)] = { args }; \ + _Pragma("GCC diagnostic pop") \ + int ret = bpf_seq_printf(seq, fmt, sizeof(fmt), \ + param, sizeof(param)); \ + ret; \ + }) + #endif diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 8e1dc6980fac..ffdc4d8e0cc0 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -6366,6 +6366,9 @@ static const struct bpf_sec_def section_defs[] = { .is_attach_btf = true, .expected_attach_type = BPF_LSM_MAC, .attach_fn = attach_lsm), + SEC_DEF("iter/", TRACING, + .expected_attach_type = BPF_TRACE_ITER, + .is_attach_btf = true), BPF_PROG_SEC("xdp", BPF_PROG_TYPE_XDP), BPF_PROG_SEC("perf_event", BPF_PROG_TYPE_PERF_EVENT), BPF_PROG_SEC("lwt_in", BPF_PROG_TYPE_LWT_IN), @@ -6629,6 +6632,7 @@ static int bpf_object__collect_struct_ops_map_reloc(struct bpf_object *obj, #define BTF_TRACE_PREFIX "btf_trace_" #define BTF_LSM_PREFIX "bpf_lsm_" +#define BTF_ITER_PREFIX "__bpf_iter__" #define BTF_MAX_NAME_SIZE 128 static int find_btf_by_prefix_kind(const struct btf *btf, const char *prefix, @@ -6659,6 +6663,9 @@ static inline int __find_vmlinux_btf_id(struct btf *btf, const char *name, else if (attach_type == BPF_LSM_MAC) err = find_btf_by_prefix_kind(btf, BTF_LSM_PREFIX, name, BTF_KIND_FUNC); + else if (attach_type == BPF_TRACE_ITER) + err = find_btf_by_prefix_kind(btf, BTF_ITER_PREFIX, name, + BTF_KIND_FUNC); else err = btf__find_by_name_kind(btf, name, BTF_KIND_FUNC); @@ -7617,6 +7624,59 @@ bpf_program__attach_cgroup(struct bpf_program *prog, int cgroup_fd) return link; } +struct bpf_link * +bpf_program__attach_iter(struct bpf_program *prog, + const struct bpf_iter_attach_opts *opts) +{ + enum bpf_attach_type attach_type; + char errmsg[STRERR_BUFSIZE]; + struct bpf_link *link; + int prog_fd, link_fd; + + if (!OPTS_VALID(opts, bpf_iter_attach_opts)) + return ERR_PTR(-EINVAL); + + prog_fd = bpf_program__fd(prog); + if (prog_fd < 0) { + pr_warn("program '%s': can't attach before loaded\n", + bpf_program__title(prog, false)); + return ERR_PTR(-EINVAL); + } + + link = calloc(1, sizeof(*link)); + if (!link) + return ERR_PTR(-ENOMEM); + link->detach = &bpf_link__detach_fd; + + attach_type = bpf_program__get_expected_attach_type(prog); + link_fd = bpf_link_create(prog_fd, 0, attach_type, NULL); + if (link_fd < 0) { + link_fd = -errno; + free(link); + pr_warn("program '%s': failed to attach to iterator: %s\n", + bpf_program__title(prog, false), + libbpf_strerror_r(link_fd, errmsg, sizeof(errmsg))); + return ERR_PTR(link_fd); + } + link->fd = link_fd; + return link; +} + +int bpf_link__create_iter(struct bpf_link *link, unsigned int flags) +{ + char errmsg[STRERR_BUFSIZE]; + int iter_fd; + + iter_fd = bpf_iter_create(bpf_link__fd(link), flags); + if (iter_fd < 0) { + iter_fd = -errno; + pr_warn("failed to create an iterator: %s\n", + libbpf_strerror_r(iter_fd, errmsg, sizeof(errmsg))); + } + + return iter_fd; +} + struct bpf_link *bpf_program__attach(struct bpf_program *prog) { const struct bpf_sec_def *sec_def; diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h index f1dacecb1619..abe5786fcab3 100644 --- a/tools/lib/bpf/libbpf.h +++ b/tools/lib/bpf/libbpf.h @@ -258,6 +258,17 @@ struct bpf_map; LIBBPF_API struct bpf_link *bpf_map__attach_struct_ops(struct bpf_map *map); +struct bpf_iter_attach_opts { + size_t sz; /* size of this struct for forward/backward compatibility */ +}; +#define bpf_iter_attach_opts__last_field sz + +LIBBPF_API struct bpf_link * +bpf_program__attach_iter(struct bpf_program *prog, + const struct bpf_iter_attach_opts *opts); +LIBBPF_API int +bpf_link__create_iter(struct bpf_link *link, unsigned int flags); + struct bpf_insn; /* diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map index bb8831605b25..1cea36f9f2e2 100644 --- a/tools/lib/bpf/libbpf.map +++ b/tools/lib/bpf/libbpf.map @@ -254,3 +254,10 @@ LIBBPF_0.0.8 { bpf_program__set_lsm; bpf_set_link_xdp_fd_opts; } LIBBPF_0.0.7; + +LIBBPF_0.0.9 { + global: + bpf_link__create_iter; + bpf_program__attach_iter; + bpf_iter_create; +} LIBBPF_0.0.8;
Three new libbpf APIs are added to support bpf_iter: - bpf_program__attach_iter Given a bpf program and additional parameters, which is none now, returns a bpf_link. - bpf_link__create_iter Given a bpf_link, create a bpf_iter and return a fd so user can then do read() to get seq_file output data. - bpf_iter_create syscall level API to create a bpf iterator. Two macros, BPF_SEQ_PRINTF0 and BPF_SEQ_PRINTF, are also introduced. These two macros can help bpf program writers with nicer bpf_seq_printf syntax similar to the kernel one. Signed-off-by: Yonghong Song <yhs@fb.com> --- tools/lib/bpf/bpf.c | 11 +++++++ tools/lib/bpf/bpf.h | 2 ++ tools/lib/bpf/bpf_tracing.h | 23 ++++++++++++++ tools/lib/bpf/libbpf.c | 60 +++++++++++++++++++++++++++++++++++++ tools/lib/bpf/libbpf.h | 11 +++++++ tools/lib/bpf/libbpf.map | 7 +++++ 6 files changed, 114 insertions(+)