Message ID | 20200520172258.551075-1-jakub@cloudflare.com |
---|---|
State | Changes Requested |
Delegated to: | BPF Maintainers |
Headers | show |
Series | [bpf] flow_dissector: Drop BPF flow dissector prog ref on netns cleanup | expand |
On 05/20, Jakub Sitnicki wrote: > When attaching a flow dissector program to a network namespace with > bpf(BPF_PROG_ATTACH, ...) we grab a reference to bpf_prog. > If netns gets destroyed while a flow dissector is still attached, and > there > are no other references to the prog, we leak the reference and the program > remains loaded. > Leak can be reproduced by running flow dissector tests from selftests/bpf: > # bpftool prog list > # ./test_flow_dissector.sh > ... > selftests: test_flow_dissector [PASS] > # bpftool prog list > 4: flow_dissector name _dissect tag e314084d332a5338 gpl > loaded_at 2020-05-20T18:50:53+0200 uid 0 > xlated 552B jited 355B memlock 4096B map_ids 3,4 > btf_id 4 > # > Fix it by detaching the flow dissector program when netns is going away. > Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook") > Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> > --- > Discovered while working on bpf_link support for netns-attached progs. > Looks like bpf tree material so pushing it out separately. Oh, good catch! > -jkbs > net/core/flow_dissector.c | 29 ++++++++++++++++++++++++++++- > 1 file changed, 28 insertions(+), 1 deletion(-) > diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c > index 3eff84824c8b..b6179cd20158 100644 > --- a/net/core/flow_dissector.c > +++ b/net/core/flow_dissector.c > @@ -179,6 +179,27 @@ int skb_flow_dissector_bpf_prog_detach(const union > bpf_attr *attr) > return 0; > } > +static void __net_exit flow_dissector_pernet_pre_exit(struct net *net) > +{ > + struct bpf_prog *attached; > + > + /* We don't lock the update-side because there are no > + * references left to this netns when we get called. Hence > + * there can be no attach/detach in progress. > + */ > + rcu_read_lock(); > + attached = rcu_dereference(net->flow_dissector_prog); > + if (attached) { > + RCU_INIT_POINTER(net->flow_dissector_prog, NULL); > + bpf_prog_put(attached); > + } > + rcu_read_unlock(); > +} I wonder, should we instead refactor existing skb_flow_dissector_bpf_prog_detach to accept netns (instead of attr) can call that here? Instead of reimplementing it (I don't think we care about mutex lock/unlock efficiency here?). Thoughts?
On Wed, May 20, 2020 at 10:40:00AM -0700, sdf@google.com wrote: > On 05/20, Jakub Sitnicki wrote: > > When attaching a flow dissector program to a network namespace with > > bpf(BPF_PROG_ATTACH, ...) we grab a reference to bpf_prog. > > > If netns gets destroyed while a flow dissector is still attached, and > > there > > are no other references to the prog, we leak the reference and the program > > remains loaded. > > > Leak can be reproduced by running flow dissector tests from selftests/bpf: > > > # bpftool prog list > > # ./test_flow_dissector.sh > > ... > > selftests: test_flow_dissector [PASS] > > # bpftool prog list > > 4: flow_dissector name _dissect tag e314084d332a5338 gpl > > loaded_at 2020-05-20T18:50:53+0200 uid 0 > > xlated 552B jited 355B memlock 4096B map_ids 3,4 > > btf_id 4 > > # > > > Fix it by detaching the flow dissector program when netns is going away. > > > Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook") > > Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> > > --- > > > Discovered while working on bpf_link support for netns-attached progs. > > Looks like bpf tree material so pushing it out separately. > Oh, good catch! Good catch indeed! > > > -jkbs > > > net/core/flow_dissector.c | 29 ++++++++++++++++++++++++++++- > > 1 file changed, 28 insertions(+), 1 deletion(-) > > > diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c > > index 3eff84824c8b..b6179cd20158 100644 > > --- a/net/core/flow_dissector.c > > +++ b/net/core/flow_dissector.c > > @@ -179,6 +179,27 @@ int skb_flow_dissector_bpf_prog_detach(const union > > bpf_attr *attr) > > return 0; > > } > > > +static void __net_exit flow_dissector_pernet_pre_exit(struct net *net) > > +{ > > + struct bpf_prog *attached; > > + > > + /* We don't lock the update-side because there are no > > + * references left to this netns when we get called. Hence > > + * there can be no attach/detach in progress. > > + */ > > + rcu_read_lock(); > > + attached = rcu_dereference(net->flow_dissector_prog); > > + if (attached) { > > + RCU_INIT_POINTER(net->flow_dissector_prog, NULL); > > + bpf_prog_put(attached); > > + } > > + rcu_read_unlock(); > > +} > I wonder, should we instead refactor existing > skb_flow_dissector_bpf_prog_detach to accept netns (instead of attr) > can call that here? Instead of reimplementing it (I don't think we > care about mutex lock/unlock efficiency here?). Thoughts? Agree. Would be good to share that bit of code.
On Wed, 20 May 2020 10:40:00 -0700 sdf@google.com wrote: > > +static void __net_exit flow_dissector_pernet_pre_exit(struct net *net) > > +{ > > + struct bpf_prog *attached; > > + > > + /* We don't lock the update-side because there are no > > + * references left to this netns when we get called. Hence > > + * there can be no attach/detach in progress. > > + */ > > + rcu_read_lock(); > > + attached = rcu_dereference(net->flow_dissector_prog); > > + if (attached) { > > + RCU_INIT_POINTER(net->flow_dissector_prog, NULL); > > + bpf_prog_put(attached); > > + } > > + rcu_read_unlock(); > > +} > I wonder, should we instead refactor existing > skb_flow_dissector_bpf_prog_detach to accept netns (instead of attr) > can call that here? Instead of reimplementing it (I don't think we > care about mutex lock/unlock efficiency here?). Thoughts? I wanted to be nice to container-heavy workloads where network namespaces get torn down frequently and in parallel and avoid locking a global mutex. OTOH we already do it today, for instance in devlink pre_exit callback. In our case I think there is a way to have the cake and it eat too: https://lore.kernel.org/bpf/20200521083435.560256-1-jakub@cloudflare.com/ Thanks for reviewing it, -jkbs
On Wed, May 20, 2020 at 10:24 AM Jakub Sitnicki <jakub@cloudflare.com> wrote: > > When attaching a flow dissector program to a network namespace with > bpf(BPF_PROG_ATTACH, ...) we grab a reference to bpf_prog. > > If netns gets destroyed while a flow dissector is still attached, and there > are no other references to the prog, we leak the reference and the program > remains loaded. > > Leak can be reproduced by running flow dissector tests from selftests/bpf: > > # bpftool prog list > # ./test_flow_dissector.sh > ... > selftests: test_flow_dissector [PASS] > # bpftool prog list > 4: flow_dissector name _dissect tag e314084d332a5338 gpl > loaded_at 2020-05-20T18:50:53+0200 uid 0 > xlated 552B jited 355B memlock 4096B map_ids 3,4 > btf_id 4 > # > > Fix it by detaching the flow dissector program when netns is going away. > > Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook") > Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> > --- > > Discovered while working on bpf_link support for netns-attached progs. > Looks like bpf tree material so pushing it out separately. > > -jkbs > [...] > /** > * __skb_flow_get_ports - extract the upper layer ports and return them > * @skb: sk_buff to extract the ports from > @@ -1827,6 +1848,8 @@ EXPORT_SYMBOL(flow_keys_basic_dissector); > > static int __init init_default_flow_dissectors(void) > { > + int err; > + > skb_flow_dissector_init(&flow_keys_dissector, > flow_keys_dissector_keys, > ARRAY_SIZE(flow_keys_dissector_keys)); > @@ -1836,7 +1859,11 @@ static int __init init_default_flow_dissectors(void) > skb_flow_dissector_init(&flow_keys_basic_dissector, > flow_keys_basic_dissector_keys, > ARRAY_SIZE(flow_keys_basic_dissector_keys)); > - return 0; > + > + err = register_pernet_subsys(&flow_dissector_pernet_ops); > + > + WARN_ON(err); syzbot simulates memory allocation failures, which can bubble up here, so this WARN_ON will probably trigger. I wonder if this could be rewritten so that init fails, when registration fails? What are the consequences? > + return err; > } > > core_initcall(init_default_flow_dissectors); > -- > 2.25.4 >
On Thu, May 21, 2020 at 12:09 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Wed, May 20, 2020 at 10:24 AM Jakub Sitnicki <jakub@cloudflare.com> wrote: > > > > When attaching a flow dissector program to a network namespace with > > bpf(BPF_PROG_ATTACH, ...) we grab a reference to bpf_prog. > > > > If netns gets destroyed while a flow dissector is still attached, and there > > are no other references to the prog, we leak the reference and the program > > remains loaded. > > > > Leak can be reproduced by running flow dissector tests from selftests/bpf: > > > > # bpftool prog list > > # ./test_flow_dissector.sh > > ... > > selftests: test_flow_dissector [PASS] > > # bpftool prog list > > 4: flow_dissector name _dissect tag e314084d332a5338 gpl > > loaded_at 2020-05-20T18:50:53+0200 uid 0 > > xlated 552B jited 355B memlock 4096B map_ids 3,4 > > btf_id 4 > > # > > > > Fix it by detaching the flow dissector program when netns is going away. > > > > Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook") > > Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> > > --- > > > > Discovered while working on bpf_link support for netns-attached progs. > > Looks like bpf tree material so pushing it out separately. > > > > -jkbs > > > > [...] > > > /** > > * __skb_flow_get_ports - extract the upper layer ports and return them > > * @skb: sk_buff to extract the ports from > > @@ -1827,6 +1848,8 @@ EXPORT_SYMBOL(flow_keys_basic_dissector); > > > > static int __init init_default_flow_dissectors(void) > > { > > + int err; > > + > > skb_flow_dissector_init(&flow_keys_dissector, > > flow_keys_dissector_keys, > > ARRAY_SIZE(flow_keys_dissector_keys)); > > @@ -1836,7 +1859,11 @@ static int __init init_default_flow_dissectors(void) > > skb_flow_dissector_init(&flow_keys_basic_dissector, > > flow_keys_basic_dissector_keys, > > ARRAY_SIZE(flow_keys_basic_dissector_keys)); > > - return 0; > > + > > + err = register_pernet_subsys(&flow_dissector_pernet_ops); > > + > > + WARN_ON(err); > > syzbot simulates memory allocation failures, which can bubble up here, > so this WARN_ON will probably trigger. I wonder if this could be > rewritten so that init fails, when registration fails? What are the > consequences? good catch. that warn is pointless. I removed it and force pushed the bpf tree.
On Thu, 21 May 2020 17:53:14 -0700 Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > On Thu, May 21, 2020 at 12:09 PM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > > > On Wed, May 20, 2020 at 10:24 AM Jakub Sitnicki <jakub@cloudflare.com> wrote: > > > > > > When attaching a flow dissector program to a network namespace with > > > bpf(BPF_PROG_ATTACH, ...) we grab a reference to bpf_prog. > > > > > > If netns gets destroyed while a flow dissector is still attached, and there > > > are no other references to the prog, we leak the reference and the program > > > remains loaded. > > > > > > Leak can be reproduced by running flow dissector tests from selftests/bpf: > > > > > > # bpftool prog list > > > # ./test_flow_dissector.sh > > > ... > > > selftests: test_flow_dissector [PASS] > > > # bpftool prog list > > > 4: flow_dissector name _dissect tag e314084d332a5338 gpl > > > loaded_at 2020-05-20T18:50:53+0200 uid 0 > > > xlated 552B jited 355B memlock 4096B map_ids 3,4 > > > btf_id 4 > > > # > > > > > > Fix it by detaching the flow dissector program when netns is going away. > > > > > > Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook") > > > Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> > > > --- > > > > > > Discovered while working on bpf_link support for netns-attached progs. > > > Looks like bpf tree material so pushing it out separately. > > > > > > -jkbs > > > > > > > [...] > > > > > /** > > > * __skb_flow_get_ports - extract the upper layer ports and return them > > > * @skb: sk_buff to extract the ports from > > > @@ -1827,6 +1848,8 @@ EXPORT_SYMBOL(flow_keys_basic_dissector); > > > > > > static int __init init_default_flow_dissectors(void) > > > { > > > + int err; > > > + > > > skb_flow_dissector_init(&flow_keys_dissector, > > > flow_keys_dissector_keys, > > > ARRAY_SIZE(flow_keys_dissector_keys)); > > > @@ -1836,7 +1859,11 @@ static int __init init_default_flow_dissectors(void) > > > skb_flow_dissector_init(&flow_keys_basic_dissector, > > > flow_keys_basic_dissector_keys, > > > ARRAY_SIZE(flow_keys_basic_dissector_keys)); > > > - return 0; > > > + > > > + err = register_pernet_subsys(&flow_dissector_pernet_ops); > > > + > > > + WARN_ON(err); > > > > syzbot simulates memory allocation failures, which can bubble up here, > > so this WARN_ON will probably trigger. I wonder if this could be > > rewritten so that init fails, when registration fails? What are the > > consequences? > > good catch. that warn is pointless. > I removed it and force pushed the bpf tree. Thanks for patching it up. I'll keep it in mind next time.
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index 3eff84824c8b..b6179cd20158 100644 --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -179,6 +179,27 @@ int skb_flow_dissector_bpf_prog_detach(const union bpf_attr *attr) return 0; } +static void __net_exit flow_dissector_pernet_pre_exit(struct net *net) +{ + struct bpf_prog *attached; + + /* We don't lock the update-side because there are no + * references left to this netns when we get called. Hence + * there can be no attach/detach in progress. + */ + rcu_read_lock(); + attached = rcu_dereference(net->flow_dissector_prog); + if (attached) { + RCU_INIT_POINTER(net->flow_dissector_prog, NULL); + bpf_prog_put(attached); + } + rcu_read_unlock(); +} + +static struct pernet_operations flow_dissector_pernet_ops __net_initdata = { + .pre_exit = flow_dissector_pernet_pre_exit, +}; + /** * __skb_flow_get_ports - extract the upper layer ports and return them * @skb: sk_buff to extract the ports from @@ -1827,6 +1848,8 @@ EXPORT_SYMBOL(flow_keys_basic_dissector); static int __init init_default_flow_dissectors(void) { + int err; + skb_flow_dissector_init(&flow_keys_dissector, flow_keys_dissector_keys, ARRAY_SIZE(flow_keys_dissector_keys)); @@ -1836,7 +1859,11 @@ static int __init init_default_flow_dissectors(void) skb_flow_dissector_init(&flow_keys_basic_dissector, flow_keys_basic_dissector_keys, ARRAY_SIZE(flow_keys_basic_dissector_keys)); - return 0; + + err = register_pernet_subsys(&flow_dissector_pernet_ops); + + WARN_ON(err); + return err; } core_initcall(init_default_flow_dissectors);
When attaching a flow dissector program to a network namespace with bpf(BPF_PROG_ATTACH, ...) we grab a reference to bpf_prog. If netns gets destroyed while a flow dissector is still attached, and there are no other references to the prog, we leak the reference and the program remains loaded. Leak can be reproduced by running flow dissector tests from selftests/bpf: # bpftool prog list # ./test_flow_dissector.sh ... selftests: test_flow_dissector [PASS] # bpftool prog list 4: flow_dissector name _dissect tag e314084d332a5338 gpl loaded_at 2020-05-20T18:50:53+0200 uid 0 xlated 552B jited 355B memlock 4096B map_ids 3,4 btf_id 4 # Fix it by detaching the flow dissector program when netns is going away. Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook") Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> --- Discovered while working on bpf_link support for netns-attached progs. Looks like bpf tree material so pushing it out separately. -jkbs net/core/flow_dissector.c | 29 ++++++++++++++++++++++++++++- 1 file changed, 28 insertions(+), 1 deletion(-)