diff mbox series

[RFC,bpf-next,08/16] bpf: Use delayed link free in bpf_link_put

Message ID 20201022082138.2322434-9-jolsa@kernel.org
State Not Applicable
Delegated to: BPF Maintainers
Headers show
Series bpf: Speed up trampoline attach | expand

Checks

Context Check Description
jkicinski/cover_letter success Link
jkicinski/fixes_present success Link
jkicinski/patch_count fail Series longer than 15 patches
jkicinski/tree_selection success Clearly marked for bpf-next
jkicinski/subject_prefix success Link
jkicinski/source_inline success Was 0 now: 0
jkicinski/verify_signedoff success Link
jkicinski/module_param success Was 0 now: 0
jkicinski/build_32bit success Errors and warnings before: 1 this patch: 1
jkicinski/kdoc success Errors and warnings before: 0 this patch: 0
jkicinski/verify_fixes success Link
jkicinski/checkpatch success total: 0 errors, 0 warnings, 0 checks, 14 lines checked
jkicinski/build_allmodconfig_warn success Errors and warnings before: 1 this patch: 1
jkicinski/header_inline success Link
jkicinski/stable success Stable not CCed

Commit Message

Jiri Olsa Oct. 22, 2020, 8:21 a.m. UTC
Moving bpf_link_free call into delayed processing so we don't
need to wait for it when releasing the link.

For example bpf_tracing_link_release could take considerable
amount of time in bpf_trampoline_put function due to
synchronize_rcu_tasks call.

It speeds up bpftrace release time in following example:

Before:

 Performance counter stats for './src/bpftrace -ve kfunc:__x64_sys_s*
    { printf("test\n"); } i:ms:10 { printf("exit\n"); exit();}' (5 runs):

     3,290,457,628      cycles:k                                 ( +-  0.27% )
       933,581,973      cycles:u                                 ( +-  0.20% )

             50.25 +- 4.79 seconds time elapsed  ( +-  9.53% )

After:

 Performance counter stats for './src/bpftrace -ve kfunc:__x64_sys_s*
    { printf("test\n"); } i:ms:10 { printf("exit\n"); exit();}' (5 runs):

     2,535,458,767      cycles:k                                 ( +-  0.55% )
       940,046,382      cycles:u                                 ( +-  0.27% )

             33.60 +- 3.27 seconds time elapsed  ( +-  9.73% )

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 kernel/bpf/syscall.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

Comments

Andrii Nakryiko Oct. 23, 2020, 7:46 p.m. UTC | #1
On Thu, Oct 22, 2020 at 8:01 AM Jiri Olsa <jolsa@kernel.org> wrote:
>
> Moving bpf_link_free call into delayed processing so we don't
> need to wait for it when releasing the link.
>
> For example bpf_tracing_link_release could take considerable
> amount of time in bpf_trampoline_put function due to
> synchronize_rcu_tasks call.
>
> It speeds up bpftrace release time in following example:
>
> Before:
>
>  Performance counter stats for './src/bpftrace -ve kfunc:__x64_sys_s*
>     { printf("test\n"); } i:ms:10 { printf("exit\n"); exit();}' (5 runs):
>
>      3,290,457,628      cycles:k                                 ( +-  0.27% )
>        933,581,973      cycles:u                                 ( +-  0.20% )
>
>              50.25 +- 4.79 seconds time elapsed  ( +-  9.53% )
>
> After:
>
>  Performance counter stats for './src/bpftrace -ve kfunc:__x64_sys_s*
>     { printf("test\n"); } i:ms:10 { printf("exit\n"); exit();}' (5 runs):
>
>      2,535,458,767      cycles:k                                 ( +-  0.55% )
>        940,046,382      cycles:u                                 ( +-  0.27% )
>
>              33.60 +- 3.27 seconds time elapsed  ( +-  9.73% )
>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  kernel/bpf/syscall.c | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 1110ecd7d1f3..61ef29f9177d 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -2346,12 +2346,8 @@ void bpf_link_put(struct bpf_link *link)
>         if (!atomic64_dec_and_test(&link->refcnt))
>                 return;
>
> -       if (in_atomic()) {
> -               INIT_WORK(&link->work, bpf_link_put_deferred);
> -               schedule_work(&link->work);
> -       } else {
> -               bpf_link_free(link);
> -       }
> +       INIT_WORK(&link->work, bpf_link_put_deferred);
> +       schedule_work(&link->work);

We just recently reverted this exact change. Doing this makes it
non-deterministic from user-space POV when the BPF program is
**actually** detached. This makes user-space programming much more
complicated and unpredictable. So please don't do this. Let's find
some other way to speed this up.

>  }
>
>  static int bpf_link_release(struct inode *inode, struct file *filp)
> --
> 2.26.2
>
Jiri Olsa Oct. 25, 2020, 7:02 p.m. UTC | #2
On Fri, Oct 23, 2020 at 12:46:15PM -0700, Andrii Nakryiko wrote:
> On Thu, Oct 22, 2020 at 8:01 AM Jiri Olsa <jolsa@kernel.org> wrote:
> >
> > Moving bpf_link_free call into delayed processing so we don't
> > need to wait for it when releasing the link.
> >
> > For example bpf_tracing_link_release could take considerable
> > amount of time in bpf_trampoline_put function due to
> > synchronize_rcu_tasks call.
> >
> > It speeds up bpftrace release time in following example:
> >
> > Before:
> >
> >  Performance counter stats for './src/bpftrace -ve kfunc:__x64_sys_s*
> >     { printf("test\n"); } i:ms:10 { printf("exit\n"); exit();}' (5 runs):
> >
> >      3,290,457,628      cycles:k                                 ( +-  0.27% )
> >        933,581,973      cycles:u                                 ( +-  0.20% )
> >
> >              50.25 +- 4.79 seconds time elapsed  ( +-  9.53% )
> >
> > After:
> >
> >  Performance counter stats for './src/bpftrace -ve kfunc:__x64_sys_s*
> >     { printf("test\n"); } i:ms:10 { printf("exit\n"); exit();}' (5 runs):
> >
> >      2,535,458,767      cycles:k                                 ( +-  0.55% )
> >        940,046,382      cycles:u                                 ( +-  0.27% )
> >
> >              33.60 +- 3.27 seconds time elapsed  ( +-  9.73% )
> >
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > ---
> >  kernel/bpf/syscall.c | 8 ++------
> >  1 file changed, 2 insertions(+), 6 deletions(-)
> >
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index 1110ecd7d1f3..61ef29f9177d 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -2346,12 +2346,8 @@ void bpf_link_put(struct bpf_link *link)
> >         if (!atomic64_dec_and_test(&link->refcnt))
> >                 return;
> >
> > -       if (in_atomic()) {
> > -               INIT_WORK(&link->work, bpf_link_put_deferred);
> > -               schedule_work(&link->work);
> > -       } else {
> > -               bpf_link_free(link);
> > -       }
> > +       INIT_WORK(&link->work, bpf_link_put_deferred);
> > +       schedule_work(&link->work);
> 
> We just recently reverted this exact change. Doing this makes it
> non-deterministic from user-space POV when the BPF program is
> **actually** detached. This makes user-space programming much more
> complicated and unpredictable. So please don't do this. Let's find
> some other way to speed this up.

ok, makes sense

jirka

> 
> >  }
> >
> >  static int bpf_link_release(struct inode *inode, struct file *filp)
> > --
> > 2.26.2
> >
>
diff mbox series

Patch

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 1110ecd7d1f3..61ef29f9177d 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2346,12 +2346,8 @@  void bpf_link_put(struct bpf_link *link)
 	if (!atomic64_dec_and_test(&link->refcnt))
 		return;
 
-	if (in_atomic()) {
-		INIT_WORK(&link->work, bpf_link_put_deferred);
-		schedule_work(&link->work);
-	} else {
-		bpf_link_free(link);
-	}
+	INIT_WORK(&link->work, bpf_link_put_deferred);
+	schedule_work(&link->work);
 }
 
 static int bpf_link_release(struct inode *inode, struct file *filp)