diff mbox series

[bpf] flow_dissector: Drop BPF flow dissector prog ref on netns cleanup

Message ID 20200520172258.551075-1-jakub@cloudflare.com
State Changes Requested
Delegated to: BPF Maintainers
Headers show
Series [bpf] flow_dissector: Drop BPF flow dissector prog ref on netns cleanup | expand

Commit Message

Jakub Sitnicki May 20, 2020, 5:22 p.m. UTC
When attaching a flow dissector program to a network namespace with
bpf(BPF_PROG_ATTACH, ...) we grab a reference to bpf_prog.

If netns gets destroyed while a flow dissector is still attached, and there
are no other references to the prog, we leak the reference and the program
remains loaded.

Leak can be reproduced by running flow dissector tests from selftests/bpf:

  # bpftool prog list
  # ./test_flow_dissector.sh
  ...
  selftests: test_flow_dissector [PASS]
  # bpftool prog list
  4: flow_dissector  name _dissect  tag e314084d332a5338  gpl
          loaded_at 2020-05-20T18:50:53+0200  uid 0
          xlated 552B  jited 355B  memlock 4096B  map_ids 3,4
          btf_id 4
  #

Fix it by detaching the flow dissector program when netns is going away.

Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook")
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---

Discovered while working on bpf_link support for netns-attached progs.
Looks like bpf tree material so pushing it out separately.

-jkbs

 net/core/flow_dissector.c | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

Comments

Stanislav Fomichev May 20, 2020, 5:40 p.m. UTC | #1
On 05/20, Jakub Sitnicki wrote:
> When attaching a flow dissector program to a network namespace with
> bpf(BPF_PROG_ATTACH, ...) we grab a reference to bpf_prog.

> If netns gets destroyed while a flow dissector is still attached, and  
> there
> are no other references to the prog, we leak the reference and the program
> remains loaded.

> Leak can be reproduced by running flow dissector tests from selftests/bpf:

>    # bpftool prog list
>    # ./test_flow_dissector.sh
>    ...
>    selftests: test_flow_dissector [PASS]
>    # bpftool prog list
>    4: flow_dissector  name _dissect  tag e314084d332a5338  gpl
>            loaded_at 2020-05-20T18:50:53+0200  uid 0
>            xlated 552B  jited 355B  memlock 4096B  map_ids 3,4
>            btf_id 4
>    #

> Fix it by detaching the flow dissector program when netns is going away.

> Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook")
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---

> Discovered while working on bpf_link support for netns-attached progs.
> Looks like bpf tree material so pushing it out separately.
Oh, good catch!

> -jkbs

>   net/core/flow_dissector.c | 29 ++++++++++++++++++++++++++++-
>   1 file changed, 28 insertions(+), 1 deletion(-)

> diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> index 3eff84824c8b..b6179cd20158 100644
> --- a/net/core/flow_dissector.c
> +++ b/net/core/flow_dissector.c
> @@ -179,6 +179,27 @@ int skb_flow_dissector_bpf_prog_detach(const union  
> bpf_attr *attr)
>   	return 0;
>   }

> +static void __net_exit flow_dissector_pernet_pre_exit(struct net *net)
> +{
> +	struct bpf_prog *attached;
> +
> +	/* We don't lock the update-side because there are no
> +	 * references left to this netns when we get called. Hence
> +	 * there can be no attach/detach in progress.
> +	 */
> +	rcu_read_lock();
> +	attached = rcu_dereference(net->flow_dissector_prog);
> +	if (attached) {
> +		RCU_INIT_POINTER(net->flow_dissector_prog, NULL);
> +		bpf_prog_put(attached);
> +	}
> +	rcu_read_unlock();
> +}
I wonder, should we instead refactor existing
skb_flow_dissector_bpf_prog_detach to accept netns (instead of attr)
can call that here? Instead of reimplementing it (I don't think we
care about mutex lock/unlock efficiency here?). Thoughts?
Alexei Starovoitov May 21, 2020, 12:56 a.m. UTC | #2
On Wed, May 20, 2020 at 10:40:00AM -0700, sdf@google.com wrote:
> On 05/20, Jakub Sitnicki wrote:
> > When attaching a flow dissector program to a network namespace with
> > bpf(BPF_PROG_ATTACH, ...) we grab a reference to bpf_prog.
> 
> > If netns gets destroyed while a flow dissector is still attached, and
> > there
> > are no other references to the prog, we leak the reference and the program
> > remains loaded.
> 
> > Leak can be reproduced by running flow dissector tests from selftests/bpf:
> 
> >    # bpftool prog list
> >    # ./test_flow_dissector.sh
> >    ...
> >    selftests: test_flow_dissector [PASS]
> >    # bpftool prog list
> >    4: flow_dissector  name _dissect  tag e314084d332a5338  gpl
> >            loaded_at 2020-05-20T18:50:53+0200  uid 0
> >            xlated 552B  jited 355B  memlock 4096B  map_ids 3,4
> >            btf_id 4
> >    #
> 
> > Fix it by detaching the flow dissector program when netns is going away.
> 
> > Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook")
> > Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> > ---
> 
> > Discovered while working on bpf_link support for netns-attached progs.
> > Looks like bpf tree material so pushing it out separately.
> Oh, good catch!

Good catch indeed!

> 
> > -jkbs
> 
> >   net/core/flow_dissector.c | 29 ++++++++++++++++++++++++++++-
> >   1 file changed, 28 insertions(+), 1 deletion(-)
> 
> > diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> > index 3eff84824c8b..b6179cd20158 100644
> > --- a/net/core/flow_dissector.c
> > +++ b/net/core/flow_dissector.c
> > @@ -179,6 +179,27 @@ int skb_flow_dissector_bpf_prog_detach(const union
> > bpf_attr *attr)
> >   	return 0;
> >   }
> 
> > +static void __net_exit flow_dissector_pernet_pre_exit(struct net *net)
> > +{
> > +	struct bpf_prog *attached;
> > +
> > +	/* We don't lock the update-side because there are no
> > +	 * references left to this netns when we get called. Hence
> > +	 * there can be no attach/detach in progress.
> > +	 */
> > +	rcu_read_lock();
> > +	attached = rcu_dereference(net->flow_dissector_prog);
> > +	if (attached) {
> > +		RCU_INIT_POINTER(net->flow_dissector_prog, NULL);
> > +		bpf_prog_put(attached);
> > +	}
> > +	rcu_read_unlock();
> > +}
> I wonder, should we instead refactor existing
> skb_flow_dissector_bpf_prog_detach to accept netns (instead of attr)
> can call that here? Instead of reimplementing it (I don't think we
> care about mutex lock/unlock efficiency here?). Thoughts?

Agree. Would be good to share that bit of code.
Jakub Sitnicki May 21, 2020, 8:42 a.m. UTC | #3
On Wed, 20 May 2020 10:40:00 -0700
sdf@google.com wrote:

> > +static void __net_exit flow_dissector_pernet_pre_exit(struct net *net)
> > +{
> > +	struct bpf_prog *attached;
> > +
> > +	/* We don't lock the update-side because there are no
> > +	 * references left to this netns when we get called. Hence
> > +	 * there can be no attach/detach in progress.
> > +	 */
> > +	rcu_read_lock();
> > +	attached = rcu_dereference(net->flow_dissector_prog);
> > +	if (attached) {
> > +		RCU_INIT_POINTER(net->flow_dissector_prog, NULL);
> > +		bpf_prog_put(attached);
> > +	}
> > +	rcu_read_unlock();
> > +}  
> I wonder, should we instead refactor existing
> skb_flow_dissector_bpf_prog_detach to accept netns (instead of attr)
> can call that here? Instead of reimplementing it (I don't think we
> care about mutex lock/unlock efficiency here?). Thoughts?

I wanted to be nice to container-heavy workloads where network
namespaces get torn down frequently and in parallel and avoid
locking a global mutex. OTOH we already do it today, for instance in
devlink pre_exit callback.

In our case I think there is a way to have the cake and it eat too:

https://lore.kernel.org/bpf/20200521083435.560256-1-jakub@cloudflare.com/

Thanks for reviewing it,
-jkbs
Andrii Nakryiko May 21, 2020, 7:08 p.m. UTC | #4
On Wed, May 20, 2020 at 10:24 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> When attaching a flow dissector program to a network namespace with
> bpf(BPF_PROG_ATTACH, ...) we grab a reference to bpf_prog.
>
> If netns gets destroyed while a flow dissector is still attached, and there
> are no other references to the prog, we leak the reference and the program
> remains loaded.
>
> Leak can be reproduced by running flow dissector tests from selftests/bpf:
>
>   # bpftool prog list
>   # ./test_flow_dissector.sh
>   ...
>   selftests: test_flow_dissector [PASS]
>   # bpftool prog list
>   4: flow_dissector  name _dissect  tag e314084d332a5338  gpl
>           loaded_at 2020-05-20T18:50:53+0200  uid 0
>           xlated 552B  jited 355B  memlock 4096B  map_ids 3,4
>           btf_id 4
>   #
>
> Fix it by detaching the flow dissector program when netns is going away.
>
> Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook")
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
>
> Discovered while working on bpf_link support for netns-attached progs.
> Looks like bpf tree material so pushing it out separately.
>
> -jkbs
>

[...]

>  /**
>   * __skb_flow_get_ports - extract the upper layer ports and return them
>   * @skb: sk_buff to extract the ports from
> @@ -1827,6 +1848,8 @@ EXPORT_SYMBOL(flow_keys_basic_dissector);
>
>  static int __init init_default_flow_dissectors(void)
>  {
> +       int err;
> +
>         skb_flow_dissector_init(&flow_keys_dissector,
>                                 flow_keys_dissector_keys,
>                                 ARRAY_SIZE(flow_keys_dissector_keys));
> @@ -1836,7 +1859,11 @@ static int __init init_default_flow_dissectors(void)
>         skb_flow_dissector_init(&flow_keys_basic_dissector,
>                                 flow_keys_basic_dissector_keys,
>                                 ARRAY_SIZE(flow_keys_basic_dissector_keys));
> -       return 0;
> +
> +       err = register_pernet_subsys(&flow_dissector_pernet_ops);
> +
> +       WARN_ON(err);

syzbot simulates memory allocation failures, which can bubble up here,
so this WARN_ON will probably trigger. I wonder if this could be
rewritten so that init fails, when registration fails? What are the
consequences?

> +       return err;
>  }
>
>  core_initcall(init_default_flow_dissectors);
> --
> 2.25.4
>
Alexei Starovoitov May 22, 2020, 12:53 a.m. UTC | #5
On Thu, May 21, 2020 at 12:09 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Wed, May 20, 2020 at 10:24 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
> >
> > When attaching a flow dissector program to a network namespace with
> > bpf(BPF_PROG_ATTACH, ...) we grab a reference to bpf_prog.
> >
> > If netns gets destroyed while a flow dissector is still attached, and there
> > are no other references to the prog, we leak the reference and the program
> > remains loaded.
> >
> > Leak can be reproduced by running flow dissector tests from selftests/bpf:
> >
> >   # bpftool prog list
> >   # ./test_flow_dissector.sh
> >   ...
> >   selftests: test_flow_dissector [PASS]
> >   # bpftool prog list
> >   4: flow_dissector  name _dissect  tag e314084d332a5338  gpl
> >           loaded_at 2020-05-20T18:50:53+0200  uid 0
> >           xlated 552B  jited 355B  memlock 4096B  map_ids 3,4
> >           btf_id 4
> >   #
> >
> > Fix it by detaching the flow dissector program when netns is going away.
> >
> > Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook")
> > Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> > ---
> >
> > Discovered while working on bpf_link support for netns-attached progs.
> > Looks like bpf tree material so pushing it out separately.
> >
> > -jkbs
> >
>
> [...]
>
> >  /**
> >   * __skb_flow_get_ports - extract the upper layer ports and return them
> >   * @skb: sk_buff to extract the ports from
> > @@ -1827,6 +1848,8 @@ EXPORT_SYMBOL(flow_keys_basic_dissector);
> >
> >  static int __init init_default_flow_dissectors(void)
> >  {
> > +       int err;
> > +
> >         skb_flow_dissector_init(&flow_keys_dissector,
> >                                 flow_keys_dissector_keys,
> >                                 ARRAY_SIZE(flow_keys_dissector_keys));
> > @@ -1836,7 +1859,11 @@ static int __init init_default_flow_dissectors(void)
> >         skb_flow_dissector_init(&flow_keys_basic_dissector,
> >                                 flow_keys_basic_dissector_keys,
> >                                 ARRAY_SIZE(flow_keys_basic_dissector_keys));
> > -       return 0;
> > +
> > +       err = register_pernet_subsys(&flow_dissector_pernet_ops);
> > +
> > +       WARN_ON(err);
>
> syzbot simulates memory allocation failures, which can bubble up here,
> so this WARN_ON will probably trigger. I wonder if this could be
> rewritten so that init fails, when registration fails? What are the
> consequences?

good catch. that warn is pointless.
I removed it and force pushed the bpf tree.
Jakub Sitnicki May 22, 2020, 8:22 a.m. UTC | #6
On Thu, 21 May 2020 17:53:14 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Thu, May 21, 2020 at 12:09 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Wed, May 20, 2020 at 10:24 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:  
> > >
> > > When attaching a flow dissector program to a network namespace with
> > > bpf(BPF_PROG_ATTACH, ...) we grab a reference to bpf_prog.
> > >
> > > If netns gets destroyed while a flow dissector is still attached, and there
> > > are no other references to the prog, we leak the reference and the program
> > > remains loaded.
> > >
> > > Leak can be reproduced by running flow dissector tests from selftests/bpf:
> > >
> > >   # bpftool prog list
> > >   # ./test_flow_dissector.sh
> > >   ...
> > >   selftests: test_flow_dissector [PASS]
> > >   # bpftool prog list
> > >   4: flow_dissector  name _dissect  tag e314084d332a5338  gpl
> > >           loaded_at 2020-05-20T18:50:53+0200  uid 0
> > >           xlated 552B  jited 355B  memlock 4096B  map_ids 3,4
> > >           btf_id 4
> > >   #
> > >
> > > Fix it by detaching the flow dissector program when netns is going away.
> > >
> > > Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook")
> > > Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> > > ---
> > >
> > > Discovered while working on bpf_link support for netns-attached progs.
> > > Looks like bpf tree material so pushing it out separately.
> > >
> > > -jkbs
> > >  
> >
> > [...]
> >  
> > >  /**
> > >   * __skb_flow_get_ports - extract the upper layer ports and return them
> > >   * @skb: sk_buff to extract the ports from
> > > @@ -1827,6 +1848,8 @@ EXPORT_SYMBOL(flow_keys_basic_dissector);
> > >
> > >  static int __init init_default_flow_dissectors(void)
> > >  {
> > > +       int err;
> > > +
> > >         skb_flow_dissector_init(&flow_keys_dissector,
> > >                                 flow_keys_dissector_keys,
> > >                                 ARRAY_SIZE(flow_keys_dissector_keys));
> > > @@ -1836,7 +1859,11 @@ static int __init init_default_flow_dissectors(void)
> > >         skb_flow_dissector_init(&flow_keys_basic_dissector,
> > >                                 flow_keys_basic_dissector_keys,
> > >                                 ARRAY_SIZE(flow_keys_basic_dissector_keys));
> > > -       return 0;
> > > +
> > > +       err = register_pernet_subsys(&flow_dissector_pernet_ops);
> > > +
> > > +       WARN_ON(err);  
> >
> > syzbot simulates memory allocation failures, which can bubble up here,
> > so this WARN_ON will probably trigger. I wonder if this could be
> > rewritten so that init fails, when registration fails? What are the
> > consequences?  
> 
> good catch. that warn is pointless.
> I removed it and force pushed the bpf tree.

Thanks for patching it up. I'll keep it in mind next time.
diff mbox series

Patch

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 3eff84824c8b..b6179cd20158 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -179,6 +179,27 @@  int skb_flow_dissector_bpf_prog_detach(const union bpf_attr *attr)
 	return 0;
 }
 
+static void __net_exit flow_dissector_pernet_pre_exit(struct net *net)
+{
+	struct bpf_prog *attached;
+
+	/* We don't lock the update-side because there are no
+	 * references left to this netns when we get called. Hence
+	 * there can be no attach/detach in progress.
+	 */
+	rcu_read_lock();
+	attached = rcu_dereference(net->flow_dissector_prog);
+	if (attached) {
+		RCU_INIT_POINTER(net->flow_dissector_prog, NULL);
+		bpf_prog_put(attached);
+	}
+	rcu_read_unlock();
+}
+
+static struct pernet_operations flow_dissector_pernet_ops __net_initdata = {
+	.pre_exit = flow_dissector_pernet_pre_exit,
+};
+
 /**
  * __skb_flow_get_ports - extract the upper layer ports and return them
  * @skb: sk_buff to extract the ports from
@@ -1827,6 +1848,8 @@  EXPORT_SYMBOL(flow_keys_basic_dissector);
 
 static int __init init_default_flow_dissectors(void)
 {
+	int err;
+
 	skb_flow_dissector_init(&flow_keys_dissector,
 				flow_keys_dissector_keys,
 				ARRAY_SIZE(flow_keys_dissector_keys));
@@ -1836,7 +1859,11 @@  static int __init init_default_flow_dissectors(void)
 	skb_flow_dissector_init(&flow_keys_basic_dissector,
 				flow_keys_basic_dissector_keys,
 				ARRAY_SIZE(flow_keys_basic_dissector_keys));
-	return 0;
+
+	err = register_pernet_subsys(&flow_dissector_pernet_ops);
+
+	WARN_ON(err);
+	return err;
 }
 
 core_initcall(init_default_flow_dissectors);