[bpf,0/4] bpf: remove __rcu annotations from bpf_prog_array

Message ID	20190508171845.201303-1-sdf@google.com
Headers	show Return-Path: <netdev-owner@vger.kernel.org> Date: Wed, 8 May 2019 10:18:41 -0700 Message-Id: <20190508171845.201303-1-sdf@google.com> Mime-Version: 1.0 Subject: [PATCH bpf 0/4] bpf: remove __rcu annotations from bpf_prog_array From: Stanislav Fomichev <sdf@google.com> To: netdev@vger.kernel.org, bpf@vger.kernel.org Cc: davem@davemloft.net, ast@kernel.org, daniel@iogearbox.net, Stanislav Fomichev <sdf@google.com> Content-Type: text/plain; charset="UTF-8" Sender: netdev-owner@vger.kernel.org Precedence: bulk
Series	bpf: remove __rcu annotations from bpf_prog_array \| expand [bpf,0/4] bpf: remove __rcu annotations from bpf_prog_array [bpf,1/4] bpf: remove __rcu annotations from bpf_prog_array [bpf,2/4] bpf: media: properly use bpf_prog_array api [bpf,3/4] bpf: cgroup: properly use bpf_prog_array api [bpf,4/4] bpf: tracing: properly use bpf_prog_array api

Stanislav Fomichev May 8, 2019, 5:18 p.m. UTC

Right now we are not using rcu api correctly: we pass __rcu pointers
to bpf_prog_array_xyz routines but don't use rcu_dereference on them
(see bpf_prog_array_delete_safe and bpf_prog_array_copy in particular).
Instead of sprinkling rcu_dereferences, let's just get rid of those
__rcu annotations and move rcu handling to a higher level.

It looks like all those routines are called from the rcu update
side and we can use simple rcu_dereference_protected to get a
reference that is valid as long as we hold a mutex (i.e. no other
updater can change the pointer, no need for rcu read section and
there should not be a use-after-free problem).

To be fair, there is currently no issue with the existing approach
since the calls are mutex-protected, pointer values don't change,
__rcu annotations are ignored. But it's still nice to use proper api.

The series fixes the following sparse warnings:

kernel/bpf/core.c:1876:44: warning: incorrect type in initializer (different address spaces)
kernel/bpf/core.c:1876:44:    expected struct bpf_prog_array_item *item
kernel/bpf/core.c:1876:44:    got struct bpf_prog_array_item [noderef] <asn:4> *
kernel/bpf/core.c:1900:26: warning: incorrect type in assignment (different address spaces)
kernel/bpf/core.c:1900:26:    expected struct bpf_prog_array_item *existing
kernel/bpf/core.c:1900:26:    got struct bpf_prog_array_item [noderef] <asn:4> *
kernel/bpf/core.c:1934:26: warning: incorrect type in assignment (different address spaces)
kernel/bpf/core.c:1934:26:    expected struct bpf_prog_array_item *[assigned] existing
kernel/bpf/core.c:1934:26:    got struct bpf_prog_array_item [noderef] <asn:4> *

Stanislav Fomichev (4):
  bpf: remove __rcu annotations from bpf_prog_array
  bpf: media: properly use bpf_prog_array api
  bpf: cgroup: properly use bpf_prog_array api
  bpf: tracing: properly use bpf_prog_array api

 drivers/media/rc/bpf-lirc.c | 27 +++++++++++++++++----------
 include/linux/bpf-cgroup.h  |  2 +-
 include/linux/bpf.h         | 12 ++++++------
 kernel/bpf/cgroup.c         | 27 +++++++++++++++++----------
 kernel/bpf/core.c           | 31 ++++++++++++-------------------
 kernel/trace/bpf_trace.c    | 18 ++++++++++--------
 6 files changed, 63 insertions(+), 54 deletions(-)

Alexei Starovoitov May 8, 2019, 5:56 p.m. UTC | #1

On Wed, May 08, 2019 at 10:18:41AM -0700, Stanislav Fomichev wrote:
> Right now we are not using rcu api correctly: we pass __rcu pointers
> to bpf_prog_array_xyz routines but don't use rcu_dereference on them
> (see bpf_prog_array_delete_safe and bpf_prog_array_copy in particular).
> Instead of sprinkling rcu_dereferences, let's just get rid of those
> __rcu annotations and move rcu handling to a higher level.
> 
> It looks like all those routines are called from the rcu update
> side and we can use simple rcu_dereference_protected to get a
> reference that is valid as long as we hold a mutex (i.e. no other
> updater can change the pointer, no need for rcu read section and
> there should not be a use-after-free problem).
> 
> To be fair, there is currently no issue with the existing approach
> since the calls are mutex-protected, pointer values don't change,
> __rcu annotations are ignored. But it's still nice to use proper api.
> 
> The series fixes the following sparse warnings:

Absolutely not.
please fix it properly.
Removing annotations is not a fix.

Stanislav Fomichev May 8, 2019, 6:12 p.m. UTC | #2

On 05/08, Alexei Starovoitov wrote:
> On Wed, May 08, 2019 at 10:18:41AM -0700, Stanislav Fomichev wrote:
> > Right now we are not using rcu api correctly: we pass __rcu pointers
> > to bpf_prog_array_xyz routines but don't use rcu_dereference on them
> > (see bpf_prog_array_delete_safe and bpf_prog_array_copy in particular).
> > Instead of sprinkling rcu_dereferences, let's just get rid of those
> > __rcu annotations and move rcu handling to a higher level.
> > 
> > It looks like all those routines are called from the rcu update
> > side and we can use simple rcu_dereference_protected to get a
> > reference that is valid as long as we hold a mutex (i.e. no other
> > updater can change the pointer, no need for rcu read section and
> > there should not be a use-after-free problem).
> > 
> > To be fair, there is currently no issue with the existing approach
> > since the calls are mutex-protected, pointer values don't change,
> > __rcu annotations are ignored. But it's still nice to use proper api.
> > 
> > The series fixes the following sparse warnings:
> 
> Absolutely not.
> please fix it properly.
> Removing annotations is not a fix.
I'm fixing it properly by removing the annotations and moving lifetime
management to the upper layer. See commits 2-4 where I fix the users, the
first patch is just the "preparation".

The users are supposed to do:

mutex_lock(&x);
p = rcu_dereference_protected(prog_array, lockdep_is_held(&x))
// ...
// call bpf_prog_array helpers while mutex guarantees that
// the object referenced by p is valid (i.e. no need for bpf_prog_array
// helpers to care about rcu lifetime)
// ...
mutex_unlock(&x);

What am I missing here?

Stanislav Fomichev May 13, 2019, 6:57 p.m. UTC | #3

On 05/08, Stanislav Fomichev wrote:
> On 05/08, Alexei Starovoitov wrote:
> > On Wed, May 08, 2019 at 10:18:41AM -0700, Stanislav Fomichev wrote:
> > > Right now we are not using rcu api correctly: we pass __rcu pointers
> > > to bpf_prog_array_xyz routines but don't use rcu_dereference on them
> > > (see bpf_prog_array_delete_safe and bpf_prog_array_copy in particular).
> > > Instead of sprinkling rcu_dereferences, let's just get rid of those
> > > __rcu annotations and move rcu handling to a higher level.
> > > 
> > > It looks like all those routines are called from the rcu update
> > > side and we can use simple rcu_dereference_protected to get a
> > > reference that is valid as long as we hold a mutex (i.e. no other
> > > updater can change the pointer, no need for rcu read section and
> > > there should not be a use-after-free problem).
> > > 
> > > To be fair, there is currently no issue with the existing approach
> > > since the calls are mutex-protected, pointer values don't change,
> > > __rcu annotations are ignored. But it's still nice to use proper api.
> > > 
> > > The series fixes the following sparse warnings:
> > 
> > Absolutely not.
> > please fix it properly.
> > Removing annotations is not a fix.
> I'm fixing it properly by removing the annotations and moving lifetime
> management to the upper layer. See commits 2-4 where I fix the users, the
> first patch is just the "preparation".
> 
> The users are supposed to do:
> 
> mutex_lock(&x);
> p = rcu_dereference_protected(prog_array, lockdep_is_held(&x))
> // ...
> // call bpf_prog_array helpers while mutex guarantees that
> // the object referenced by p is valid (i.e. no need for bpf_prog_array
> // helpers to care about rcu lifetime)
> // ...
> mutex_unlock(&x);
> 
> What am I missing here?

Just to give you my perspective on why current api with __rcu annotations
is working, but not optimal (even if used from the rcu read section).

Sample code:

	struct bpf_prog_array __rcu *progs = <comes from somewhere>;
	int n;

	rcu_read_lock();
	n = bpf_prog_array_length(progs);
	if (n > 0) {
	  // do something with __rcu progs
	  do_something(progs);
	}
	rcu_read_unlock();

Since progs is __rcu annotated, do_something() would need to do
rcu_dereference again and it might get a different value compared to
whatever bpf_prog_array_free got while doing its dereference.

A better way is not to deal with rcu inside those helpers and let
higher layers do that:

	struct bpf_prog_array __rcu *progs = <comes from somewhere>;
	struct bpf_prog_array *p;
	int n;

	rcu_read_lock();
	p = rcu_dereference(p);
	n = bpf_prog_array_length(p);
	if (n > 0) {
	  do_something(p); // do_something sees the same p as bpf_prog_array_length
	}
	rcu_read_unlock();

What do you think?

If it sounds reasonable, I can follow up with a v2 because I think I can
replace xchg with rcu_swap_protected as well. Or I can resend for bpf-next to
have another round of discussion. Thoughts?

Alexei Starovoitov May 14, 2019, 4:55 p.m. UTC | #4

On Mon, May 13, 2019 at 11:57 AM Stanislav Fomichev <sdf@fomichev.me> wrote:
>
> On 05/08, Stanislav Fomichev wrote:
> > On 05/08, Alexei Starovoitov wrote:
> > > On Wed, May 08, 2019 at 10:18:41AM -0700, Stanislav Fomichev wrote:
> > > > Right now we are not using rcu api correctly: we pass __rcu pointers
> > > > to bpf_prog_array_xyz routines but don't use rcu_dereference on them
> > > > (see bpf_prog_array_delete_safe and bpf_prog_array_copy in particular).
> > > > Instead of sprinkling rcu_dereferences, let's just get rid of those
> > > > __rcu annotations and move rcu handling to a higher level.
> > > >
> > > > It looks like all those routines are called from the rcu update
> > > > side and we can use simple rcu_dereference_protected to get a
> > > > reference that is valid as long as we hold a mutex (i.e. no other
> > > > updater can change the pointer, no need for rcu read section and
> > > > there should not be a use-after-free problem).
> > > >
> > > > To be fair, there is currently no issue with the existing approach
> > > > since the calls are mutex-protected, pointer values don't change,
> > > > __rcu annotations are ignored. But it's still nice to use proper api.
> > > >
> > > > The series fixes the following sparse warnings:
> > >
> > > Absolutely not.
> > > please fix it properly.
> > > Removing annotations is not a fix.
> > I'm fixing it properly by removing the annotations and moving lifetime
> > management to the upper layer. See commits 2-4 where I fix the users, the
> > first patch is just the "preparation".
> >
> > The users are supposed to do:
> >
> > mutex_lock(&x);
> > p = rcu_dereference_protected(prog_array, lockdep_is_held(&x))
> > // ...
> > // call bpf_prog_array helpers while mutex guarantees that
> > // the object referenced by p is valid (i.e. no need for bpf_prog_array
> > // helpers to care about rcu lifetime)
> > // ...
> > mutex_unlock(&x);
> >
> > What am I missing here?
>
> Just to give you my perspective on why current api with __rcu annotations
> is working, but not optimal (even if used from the rcu read section).
>
> Sample code:
>
>         struct bpf_prog_array __rcu *progs = <comes from somewhere>;
>         int n;
>
>         rcu_read_lock();
>         n = bpf_prog_array_length(progs);
>         if (n > 0) {
>           // do something with __rcu progs
>           do_something(progs);
>         }
>         rcu_read_unlock();
>
> Since progs is __rcu annotated, do_something() would need to do
> rcu_dereference again and it might get a different value compared to
> whatever bpf_prog_array_free got while doing its dereference.

correct and I believe the code deals with it fine.
cnt could be different between two calls.

> A better way is not to deal with rcu inside those helpers and let
> higher layers do that:
>
>         struct bpf_prog_array __rcu *progs = <comes from somewhere>;
>         struct bpf_prog_array *p;
>         int n;
>
>         rcu_read_lock();
>         p = rcu_dereference(p);
>         n = bpf_prog_array_length(p);
>         if (n > 0) {
>           do_something(p); // do_something sees the same p as bpf_prog_array_length
>         }
>         rcu_read_unlock();
>
> What do you think?

I'm not sure that generically applicable.
Which piece of code do you have in mind for such refactoring?

Stanislav Fomichev May 14, 2019, 5:30 p.m. UTC | #5

On 05/14, Alexei Starovoitov wrote:
> On Mon, May 13, 2019 at 11:57 AM Stanislav Fomichev <sdf@fomichev.me> wrote:
> >
> > On 05/08, Stanislav Fomichev wrote:
> > > On 05/08, Alexei Starovoitov wrote:
> > > > On Wed, May 08, 2019 at 10:18:41AM -0700, Stanislav Fomichev wrote:
> > > > > Right now we are not using rcu api correctly: we pass __rcu pointers
> > > > > to bpf_prog_array_xyz routines but don't use rcu_dereference on them
> > > > > (see bpf_prog_array_delete_safe and bpf_prog_array_copy in particular).
> > > > > Instead of sprinkling rcu_dereferences, let's just get rid of those
> > > > > __rcu annotations and move rcu handling to a higher level.
> > > > >
> > > > > It looks like all those routines are called from the rcu update
> > > > > side and we can use simple rcu_dereference_protected to get a
> > > > > reference that is valid as long as we hold a mutex (i.e. no other
> > > > > updater can change the pointer, no need for rcu read section and
> > > > > there should not be a use-after-free problem).
> > > > >
> > > > > To be fair, there is currently no issue with the existing approach
> > > > > since the calls are mutex-protected, pointer values don't change,
> > > > > __rcu annotations are ignored. But it's still nice to use proper api.
> > > > >
> > > > > The series fixes the following sparse warnings:
> > > >
> > > > Absolutely not.
> > > > please fix it properly.
> > > > Removing annotations is not a fix.
> > > I'm fixing it properly by removing the annotations and moving lifetime
> > > management to the upper layer. See commits 2-4 where I fix the users, the
> > > first patch is just the "preparation".
> > >
> > > The users are supposed to do:
> > >
> > > mutex_lock(&x);
> > > p = rcu_dereference_protected(prog_array, lockdep_is_held(&x))
> > > // ...
> > > // call bpf_prog_array helpers while mutex guarantees that
> > > // the object referenced by p is valid (i.e. no need for bpf_prog_array
> > > // helpers to care about rcu lifetime)
> > > // ...
> > > mutex_unlock(&x);
> > >
> > > What am I missing here?
> >
> > Just to give you my perspective on why current api with __rcu annotations
> > is working, but not optimal (even if used from the rcu read section).
> >
> > Sample code:
> >
> >         struct bpf_prog_array __rcu *progs = <comes from somewhere>;
> >         int n;
> >
> >         rcu_read_lock();
> >         n = bpf_prog_array_length(progs);
> >         if (n > 0) {
> >           // do something with __rcu progs
> >           do_something(progs);
> >         }
> >         rcu_read_unlock();
> >
> > Since progs is __rcu annotated, do_something() would need to do
> > rcu_dereference again and it might get a different value compared to
> > whatever bpf_prog_array_free got while doing its dereference.
> 
> correct and I believe the code deals with it fine.
> cnt could be different between two calls.
Yes, currently there is no problem, all users of these apis
are fine because they are holding a mutex (and are hence in the rcu update
path, i.e. the pointer can't change and they have a consistent view
between the calls).

For example, we currently have:

	int n1, n2;
	mutex_lock(&x);
	n1 = bpf_prog_array_length(progs);
	n2 = bpf_prog_array_length(progs);
	// n1 is guaranteed to be the same as n2 as long as we
	// hold a mutex; single updater only
	...
	mutex_unlock(&x);

Versus:

	rcu_read_lock();
	n1 = bpf_prog_array_length(progs);
	n2 = bpf_prog_array_length(progs);
	// n1 and n2 can be different; rcu_read_lock is all about
	// lifetime
	...
	rcu_read_unlock();

But, as I said, we don't use those __rcu annotated bpf_prog_array
routines in the rcu read section, so we are fine.

(I'm just showing that potentially there might be a problem if we don't move
rcu management away from bpf_prog_array routines and if someone
decides to call them under rcu_read_lock).

> > A better way is not to deal with rcu inside those helpers and let
> > higher layers do that:
> >
> >         struct bpf_prog_array __rcu *progs = <comes from somewhere>;
> >         struct bpf_prog_array *p;
> >         int n;
> >
> >         rcu_read_lock();
> >         p = rcu_dereference(p);
> >         n = bpf_prog_array_length(p);
> >         if (n > 0) {
> >           do_something(p); // do_something sees the same p as bpf_prog_array_length
> >         }
> >         rcu_read_unlock();
> >
> > What do you think?
> 
> I'm not sure that generically applicable.
> Which piece of code do you have in mind for such refactoring?
All existing callers of (take a look at patches 2-4):

* bpf_prog_array_free
* bpf_prog_array_length
* bpf_prog_array_copy_to_user
* bpf_prog_array_delete_safe
* bpf_prog_array_copy_info
* bpf_prog_array_copy

They are:

* perf event bpf attach/detach/query (under bpf_event_mutex)
* cgroup attach/detach/query (management of cgroup_bpf->effective, under
  cgroup_mutex)
* lirc bpf attach/detach/query (under ir_raw_handler_lock)

Nobody uses these apis in the rcu read section, so we can remove the
annotations and use rcu_dereference_protected on the higher layer.

Side bonus is that we can also remove __rcu from cgroup_bpf.inactive
(which is just a temp storage and not updated in the rcu fashion) and
from old_array in activate_effective_progs (which is an on-stack
array and, I guess, only has __rcu annotation to satisfy sparse).

So nothing is changing functionality-wise, but it becomes a bit easier
to reason about by being more explicit with rcu api.

Alexei Starovoitov May 14, 2019, 5:45 p.m. UTC | #6

On Tue, May 14, 2019 at 10:30:02AM -0700, Stanislav Fomichev wrote:
> On 05/14, Alexei Starovoitov wrote:
> > On Mon, May 13, 2019 at 11:57 AM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > >
> > > On 05/08, Stanislav Fomichev wrote:
> > > > On 05/08, Alexei Starovoitov wrote:
> > > > > On Wed, May 08, 2019 at 10:18:41AM -0700, Stanislav Fomichev wrote:
> > > > > > Right now we are not using rcu api correctly: we pass __rcu pointers
> > > > > > to bpf_prog_array_xyz routines but don't use rcu_dereference on them
> > > > > > (see bpf_prog_array_delete_safe and bpf_prog_array_copy in particular).
> > > > > > Instead of sprinkling rcu_dereferences, let's just get rid of those
> > > > > > __rcu annotations and move rcu handling to a higher level.
> > > > > >
> > > > > > It looks like all those routines are called from the rcu update
> > > > > > side and we can use simple rcu_dereference_protected to get a
> > > > > > reference that is valid as long as we hold a mutex (i.e. no other
> > > > > > updater can change the pointer, no need for rcu read section and
> > > > > > there should not be a use-after-free problem).
> > > > > >
> > > > > > To be fair, there is currently no issue with the existing approach
> > > > > > since the calls are mutex-protected, pointer values don't change,
> > > > > > __rcu annotations are ignored. But it's still nice to use proper api.
> > > > > >
> > > > > > The series fixes the following sparse warnings:
> > > > >
> > > > > Absolutely not.
> > > > > please fix it properly.
> > > > > Removing annotations is not a fix.
> > > > I'm fixing it properly by removing the annotations and moving lifetime
> > > > management to the upper layer. See commits 2-4 where I fix the users, the
> > > > first patch is just the "preparation".
> > > >
> > > > The users are supposed to do:
> > > >
> > > > mutex_lock(&x);
> > > > p = rcu_dereference_protected(prog_array, lockdep_is_held(&x))
> > > > // ...
> > > > // call bpf_prog_array helpers while mutex guarantees that
> > > > // the object referenced by p is valid (i.e. no need for bpf_prog_array
> > > > // helpers to care about rcu lifetime)
> > > > // ...
> > > > mutex_unlock(&x);
> > > >
> > > > What am I missing here?
> > >
> > > Just to give you my perspective on why current api with __rcu annotations
> > > is working, but not optimal (even if used from the rcu read section).
> > >
> > > Sample code:
> > >
> > >         struct bpf_prog_array __rcu *progs = <comes from somewhere>;
> > >         int n;
> > >
> > >         rcu_read_lock();
> > >         n = bpf_prog_array_length(progs);
> > >         if (n > 0) {
> > >           // do something with __rcu progs
> > >           do_something(progs);
> > >         }
> > >         rcu_read_unlock();
> > >
> > > Since progs is __rcu annotated, do_something() would need to do
> > > rcu_dereference again and it might get a different value compared to
> > > whatever bpf_prog_array_free got while doing its dereference.
> > 
> > correct and I believe the code deals with it fine.
> > cnt could be different between two calls.
> Yes, currently there is no problem, all users of these apis
> are fine because they are holding a mutex (and are hence in the rcu update
> path, i.e. the pointer can't change and they have a consistent view
> between the calls).
> 
> For example, we currently have:
> 
> 	int n1, n2;
> 	mutex_lock(&x);
> 	n1 = bpf_prog_array_length(progs);
> 	n2 = bpf_prog_array_length(progs);
> 	// n1 is guaranteed to be the same as n2 as long as we
> 	// hold a mutex; single updater only
> 	...
> 	mutex_unlock(&x);
> 
> Versus:
> 
> 	rcu_read_lock();
> 	n1 = bpf_prog_array_length(progs);
> 	n2 = bpf_prog_array_length(progs);
> 	// n1 and n2 can be different; rcu_read_lock is all about
> 	// lifetime
> 	...
> 	rcu_read_unlock();
> 
> But, as I said, we don't use those __rcu annotated bpf_prog_array
> routines in the rcu read section, so we are fine.
> 
> (I'm just showing that potentially there might be a problem if we don't move
> rcu management away from bpf_prog_array routines and if someone
> decides to call them under rcu_read_lock).
> 
> > > A better way is not to deal with rcu inside those helpers and let
> > > higher layers do that:
> > >
> > >         struct bpf_prog_array __rcu *progs = <comes from somewhere>;
> > >         struct bpf_prog_array *p;
> > >         int n;
> > >
> > >         rcu_read_lock();
> > >         p = rcu_dereference(p);
> > >         n = bpf_prog_array_length(p);
> > >         if (n > 0) {
> > >           do_something(p); // do_something sees the same p as bpf_prog_array_length
> > >         }
> > >         rcu_read_unlock();
> > >
> > > What do you think?
> > 
> > I'm not sure that generically applicable.
> > Which piece of code do you have in mind for such refactoring?
> All existing callers of (take a look at patches 2-4):
> 
> * bpf_prog_array_free
> * bpf_prog_array_length
> * bpf_prog_array_copy_to_user
> * bpf_prog_array_delete_safe
> * bpf_prog_array_copy_info
> * bpf_prog_array_copy
> 
> They are:
> 
> * perf event bpf attach/detach/query (under bpf_event_mutex)
> * cgroup attach/detach/query (management of cgroup_bpf->effective, under
>   cgroup_mutex)
> * lirc bpf attach/detach/query (under ir_raw_handler_lock)
> 
> Nobody uses these apis in the rcu read section, so we can remove the
> annotations and use rcu_dereference_protected on the higher layer.
> 
> Side bonus is that we can also remove __rcu from cgroup_bpf.inactive
> (which is just a temp storage and not updated in the rcu fashion) and
> from old_array in activate_effective_progs (which is an on-stack
> array and, I guess, only has __rcu annotation to satisfy sparse).
> 
> So nothing is changing functionality-wise, but it becomes a bit easier
> to reason about by being more explicit with rcu api.

disagree. mutex is necesary.

Stanislav Fomichev May 14, 2019, 5:53 p.m. UTC | #7

On 05/14, Alexei Starovoitov wrote:
> On Tue, May 14, 2019 at 10:30:02AM -0700, Stanislav Fomichev wrote:
> > On 05/14, Alexei Starovoitov wrote:
> > > On Mon, May 13, 2019 at 11:57 AM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > > >
> > > > On 05/08, Stanislav Fomichev wrote:
> > > > > On 05/08, Alexei Starovoitov wrote:
> > > > > > On Wed, May 08, 2019 at 10:18:41AM -0700, Stanislav Fomichev wrote:
> > > > > > > Right now we are not using rcu api correctly: we pass __rcu pointers
> > > > > > > to bpf_prog_array_xyz routines but don't use rcu_dereference on them
> > > > > > > (see bpf_prog_array_delete_safe and bpf_prog_array_copy in particular).
> > > > > > > Instead of sprinkling rcu_dereferences, let's just get rid of those
> > > > > > > __rcu annotations and move rcu handling to a higher level.
> > > > > > >
> > > > > > > It looks like all those routines are called from the rcu update
> > > > > > > side and we can use simple rcu_dereference_protected to get a
> > > > > > > reference that is valid as long as we hold a mutex (i.e. no other
> > > > > > > updater can change the pointer, no need for rcu read section and
> > > > > > > there should not be a use-after-free problem).
> > > > > > >
> > > > > > > To be fair, there is currently no issue with the existing approach
> > > > > > > since the calls are mutex-protected, pointer values don't change,
> > > > > > > __rcu annotations are ignored. But it's still nice to use proper api.
> > > > > > >
> > > > > > > The series fixes the following sparse warnings:
> > > > > >
> > > > > > Absolutely not.
> > > > > > please fix it properly.
> > > > > > Removing annotations is not a fix.
> > > > > I'm fixing it properly by removing the annotations and moving lifetime
> > > > > management to the upper layer. See commits 2-4 where I fix the users, the
> > > > > first patch is just the "preparation".
> > > > >
> > > > > The users are supposed to do:
> > > > >
> > > > > mutex_lock(&x);
> > > > > p = rcu_dereference_protected(prog_array, lockdep_is_held(&x))
> > > > > // ...
> > > > > // call bpf_prog_array helpers while mutex guarantees that
> > > > > // the object referenced by p is valid (i.e. no need for bpf_prog_array
> > > > > // helpers to care about rcu lifetime)
> > > > > // ...
> > > > > mutex_unlock(&x);
> > > > >
> > > > > What am I missing here?
> > > >
> > > > Just to give you my perspective on why current api with __rcu annotations
> > > > is working, but not optimal (even if used from the rcu read section).
> > > >
> > > > Sample code:
> > > >
> > > >         struct bpf_prog_array __rcu *progs = <comes from somewhere>;
> > > >         int n;
> > > >
> > > >         rcu_read_lock();
> > > >         n = bpf_prog_array_length(progs);
> > > >         if (n > 0) {
> > > >           // do something with __rcu progs
> > > >           do_something(progs);
> > > >         }
> > > >         rcu_read_unlock();
> > > >
> > > > Since progs is __rcu annotated, do_something() would need to do
> > > > rcu_dereference again and it might get a different value compared to
> > > > whatever bpf_prog_array_free got while doing its dereference.
> > > 
> > > correct and I believe the code deals with it fine.
> > > cnt could be different between two calls.
> > Yes, currently there is no problem, all users of these apis
> > are fine because they are holding a mutex (and are hence in the rcu update
> > path, i.e. the pointer can't change and they have a consistent view
> > between the calls).
> > 
> > For example, we currently have:
> > 
> > 	int n1, n2;
> > 	mutex_lock(&x);
> > 	n1 = bpf_prog_array_length(progs);
> > 	n2 = bpf_prog_array_length(progs);
> > 	// n1 is guaranteed to be the same as n2 as long as we
> > 	// hold a mutex; single updater only
> > 	...
> > 	mutex_unlock(&x);
> > 
> > Versus:
> > 
> > 	rcu_read_lock();
> > 	n1 = bpf_prog_array_length(progs);
> > 	n2 = bpf_prog_array_length(progs);
> > 	// n1 and n2 can be different; rcu_read_lock is all about
> > 	// lifetime
> > 	...
> > 	rcu_read_unlock();
> > 
> > But, as I said, we don't use those __rcu annotated bpf_prog_array
> > routines in the rcu read section, so we are fine.
> > 
> > (I'm just showing that potentially there might be a problem if we don't move
> > rcu management away from bpf_prog_array routines and if someone
> > decides to call them under rcu_read_lock).
> > 
> > > > A better way is not to deal with rcu inside those helpers and let
> > > > higher layers do that:
> > > >
> > > >         struct bpf_prog_array __rcu *progs = <comes from somewhere>;
> > > >         struct bpf_prog_array *p;
> > > >         int n;
> > > >
> > > >         rcu_read_lock();
> > > >         p = rcu_dereference(p);
> > > >         n = bpf_prog_array_length(p);
> > > >         if (n > 0) {
> > > >           do_something(p); // do_something sees the same p as bpf_prog_array_length
> > > >         }
> > > >         rcu_read_unlock();
> > > >
> > > > What do you think?
> > > 
> > > I'm not sure that generically applicable.
> > > Which piece of code do you have in mind for such refactoring?
> > All existing callers of (take a look at patches 2-4):
> > 
> > * bpf_prog_array_free
> > * bpf_prog_array_length
> > * bpf_prog_array_copy_to_user
> > * bpf_prog_array_delete_safe
> > * bpf_prog_array_copy_info
> > * bpf_prog_array_copy
> > 
> > They are:
> > 
> > * perf event bpf attach/detach/query (under bpf_event_mutex)
> > * cgroup attach/detach/query (management of cgroup_bpf->effective, under
> >   cgroup_mutex)
> > * lirc bpf attach/detach/query (under ir_raw_handler_lock)
> > 
> > Nobody uses these apis in the rcu read section, so we can remove the
> > annotations and use rcu_dereference_protected on the higher layer.
> > 
> > Side bonus is that we can also remove __rcu from cgroup_bpf.inactive
> > (which is just a temp storage and not updated in the rcu fashion) and
> > from old_array in activate_effective_progs (which is an on-stack
> > array and, I guess, only has __rcu annotation to satisfy sparse).
> > 
> > So nothing is changing functionality-wise, but it becomes a bit easier
> > to reason about by being more explicit with rcu api.
> 
> disagree. mutex is necesary.
Oh, for sure, mutex is necessary, I'm not taking away the mutex.

All I'm doing is I'm asking the users of those apis to be more
explicit by using rcu_dereference_protected(progs, lockdep_is_held(mtx)),
which can be read as "I know I'm holding a mutex and I'm in rcu
update section, progs pointer is not going to change. So as long
as I'm holding a mutex, I can call bpf_prog_array helpers and
get consistent result between the calls".

Existing __rcu annotations don't add anything to the safety.
See, for example, bpf_prog_array_copy_to_user and bpf_prog_array_length
which currently do rcu_read_lock/rcu_read_unlock inside. If we expect
the callers to hold a mutex, why do we start rcu-read section inside
and rcu-update section?

Alexei Starovoitov May 15, 2019, 1:25 a.m. UTC | #8

On Tue, May 14, 2019 at 10:53 AM Stanislav Fomichev <sdf@fomichev.me> wrote:
>
> Existing __rcu annotations don't add anything to the safety.

what do you mean?
BPF_PROG_RUN_ARRAY derefs these pointers under rcu.

Stanislav Fomichev May 15, 2019, 2:11 a.m. UTC | #9

On 05/14, Alexei Starovoitov wrote:
> On Tue, May 14, 2019 at 10:53 AM Stanislav Fomichev <sdf@fomichev.me> wrote:
> >
> > Existing __rcu annotations don't add anything to the safety.
> 
> what do you mean?
> BPF_PROG_RUN_ARRAY derefs these pointers under rcu.
And I'm not removing them from the struct definitions, I'm removing __rcu
from the helpers' arguments only. Because those helpers are always called
with the mutex and don't need it. To reiterate: rcu_dereference_protected
is enough to get a pointer (from __rcu annotated) for the duration
of the mutex, helpers can operate on the non-annotated (dereferenced) prog
array.

Read section still does the following (BPF_PROG_RUN_ARRAY):

	rcu_read_lock();
	p = rcu_dereference(__rcu'd progs);
	while (p) {}
	rcu_read_unlock();

And write sections do:

	mutex_lock(&mtx);
	p = rcu_dereference_protected(__rcu'd progs, lockdep_is_held(&mtx);
	// ^^^ does rcu_dereference in the mutex protected section
	bpf_prog_array_length(p);
	bpf_prog_array_copy_to_user(p, ...);
	bpf_prog_array_delete_safe(p);
	bpf_prog_array_copy_info(p);
	bpf_prog_array_copy(p, ...);
	bpf_prog_array_free(p);
	// ^^^ all these helpers are consistent already with or
	// without __rcu annotation because we hold a mutex and
	// guarantee no concurrent updates, so __rcu annotations
	// for their input arguments is not needed.
	mutex_unlock(&mtx);

Alexei Starovoitov May 15, 2019, 2:27 a.m. UTC | #10

On Tue, May 14, 2019 at 7:11 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
>
> On 05/14, Alexei Starovoitov wrote:
> > On Tue, May 14, 2019 at 10:53 AM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > >
> > > Existing __rcu annotations don't add anything to the safety.
> >
> > what do you mean?
> > BPF_PROG_RUN_ARRAY derefs these pointers under rcu.
> And I'm not removing them from the struct definitions, I'm removing __rcu
> from the helpers' arguments only. Because those helpers are always called
> with the mutex and don't need it. To reiterate: rcu_dereference_protected
> is enough to get a pointer (from __rcu annotated) for the duration
> of the mutex, helpers can operate on the non-annotated (dereferenced) prog
> array.
>
> Read section still does the following (BPF_PROG_RUN_ARRAY):
>
>         rcu_read_lock();
>         p = rcu_dereference(__rcu'd progs);
>         while (p) {}
>         rcu_read_unlock();
>
> And write sections do:
>
>         mutex_lock(&mtx);
>         p = rcu_dereference_protected(__rcu'd progs, lockdep_is_held(&mtx);
>         // ^^^ does rcu_dereference in the mutex protected section
>         bpf_prog_array_length(p);
>         bpf_prog_array_copy_to_user(p, ...);
>         bpf_prog_array_delete_safe(p);
>         bpf_prog_array_copy_info(p);
>         bpf_prog_array_copy(p, ...);
>         bpf_prog_array_free(p);

what about activate_effective_progs() ?
I wouldn't want to lose the annotation there.
but then array_free will lose it?
in some cases it's called without mutex in a destruction path.
also how do you propose to solve different 'mtx' in
lockdep_is_held(&mtx)); ?
passing it through the call chain is imo not clean.

I wonder what others think about this whole discussion.

>         // ^^^ all these helpers are consistent already with or
>         // without __rcu annotation because we hold a mutex and
>         // guarantee no concurrent updates, so __rcu annotations
>         // for their input arguments is not needed.
>         mutex_unlock(&mtx);

Eric Dumazet May 15, 2019, 2:44 a.m. UTC | #11

On 5/14/19 7:27 PM, Alexei Starovoitov wrote:

> what about activate_effective_progs() ?
> I wouldn't want to lose the annotation there.
> but then array_free will lose it?
> in some cases it's called without mutex in a destruction path.
> also how do you propose to solve different 'mtx' in
> lockdep_is_held(&mtx)); ?
> passing it through the call chain is imo not clean.
> 

Usage of RCU api in BPF is indeed a bit strange and lacks lockdep support.

Looking at bpf_prog_array_copy_core() for example, it looks like the __rcu
in the first argument is not needed, since the caller must have done the proper dereference already,
and the caller knows which mutex is protecting its rcu_dereference_protected() for the writer sides.

bpf_prog_array_copy_core() should manipulate standard pointers, with no __rcu stuff.

The analogy in net/ are probably the rtnl_dereference() users.

Stanislav Fomichev May 15, 2019, 2:56 a.m. UTC | #12

On 05/14, Eric Dumazet wrote:
> 
> 
> On 5/14/19 7:27 PM, Alexei Starovoitov wrote:
> 
> > what about activate_effective_progs() ?
> > I wouldn't want to lose the annotation there.
> > but then array_free will lose it?
It would not have have it because the input is the result of
bpf_prog_array_alloc() which returns kmalloc'd pointer (and
is not bound to an rcu section).

> > in some cases it's called without mutex in a destruction path.
Hm, can you point me to this place? I think I checked every path,
maybe I missed something subtle. I'll double check.

> > also how do you propose to solve different 'mtx' in
> > lockdep_is_held(&mtx)); ?
> > passing it through the call chain is imo not clean.
Every caller would know which mutex protects it. As Eric said below,
I'm adding a bunch of xxx_dereference macros that hardcode mutex, like
the existing rtnl_dereference.

> Usage of RCU api in BPF is indeed a bit strange and lacks lockdep support.
> 
> Looking at bpf_prog_array_copy_core() for example, it looks like the __rcu
> in the first argument is not needed, since the caller must have done the proper dereference already,
> and the caller knows which mutex is protecting its rcu_dereference_protected() for the writer sides.
> 
> bpf_prog_array_copy_core() should manipulate standard pointers, with no __rcu stuff.
> 
> The analogy in net/ are probably the rtnl_dereference() users.

Alexei Starovoitov May 15, 2019, 3:16 a.m. UTC | #13

On Tue, May 14, 2019 at 07:56:36PM -0700, Stanislav Fomichev wrote:
> On 05/14, Eric Dumazet wrote:
> > 
> > 
> > On 5/14/19 7:27 PM, Alexei Starovoitov wrote:
> > 
> > > what about activate_effective_progs() ?
> > > I wouldn't want to lose the annotation there.
> > > but then array_free will lose it?
> It would not have have it because the input is the result of
> bpf_prog_array_alloc() which returns kmalloc'd pointer (and
> is not bound to an rcu section).
> 
> > > in some cases it's called without mutex in a destruction path.
> Hm, can you point me to this place? I think I checked every path,
> maybe I missed something subtle. I'll double check.

I thought cgroup dying thingy is not doing it, but looks like it is.

> > > also how do you propose to solve different 'mtx' in
> > > lockdep_is_held(&mtx)); ?
> > > passing it through the call chain is imo not clean.
> Every caller would know which mutex protects it. As Eric said below,
> I'm adding a bunch of xxx_dereference macros that hardcode mutex, like
> the existing rtnl_dereference.

I have a hard time imagining how it will look without being a mess.
There are three mutexes to pass down instead of single rtnl_derefernce:
cgroup_mutex, ir_raw_handler_lock, bpf_event_mutex.

Anyway, let's see how the patches look and discuss further.

Stanislav Fomichev May 15, 2019, 3:38 a.m. UTC | #14

On Tue, May 14, 2019 at 8:16 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, May 14, 2019 at 07:56:36PM -0700, Stanislav Fomichev wrote:
> > On 05/14, Eric Dumazet wrote:
> > >
> > >
> > > On 5/14/19 7:27 PM, Alexei Starovoitov wrote:
> > >
> > > > what about activate_effective_progs() ?
> > > > I wouldn't want to lose the annotation there.
> > > > but then array_free will lose it?
> > It would not have have it because the input is the result of
> > bpf_prog_array_alloc() which returns kmalloc'd pointer (and
> > is not bound to an rcu section).
> >
> > > > in some cases it's called without mutex in a destruction path.
> > Hm, can you point me to this place? I think I checked every path,
> > maybe I missed something subtle. I'll double check.
>
> I thought cgroup dying thingy is not doing it, but looks like it is.
I was looking at the following chain:
css_release_work_fn
  mutex_lock(&cgroup_mutex);
    cgroup_bpf_put
      bpf_prog_array_free
  mutex_unlock(&cgroup_mutex);

I'll take another look tomorrow with a fresh mind :-)

> > > > also how do you propose to solve different 'mtx' in
> > > > lockdep_is_held(&mtx)); ?
> > > > passing it through the call chain is imo not clean.
> > Every caller would know which mutex protects it. As Eric said below,
> > I'm adding a bunch of xxx_dereference macros that hardcode mutex, like
> > the existing rtnl_dereference.
>
> I have a hard time imagining how it will look without being a mess.
> There are three mutexes to pass down instead of single rtnl_derefernce:
> cgroup_mutex, ir_raw_handler_lock, bpf_event_mutex.
We don't need to pass them down, we need those xxx_dereference
wrappers only in the callers of those apis. They are private
to cgroup.c/lirc.c/bpf_trace.c.

Take a look at the patches 2-4 in the current series where I convert
the callers.

(Though, I'd rename xxx_dereference to xxx_rcu_dereference for clarity we
get to a v2).
>
> Anyway, let's see how the patches look and discuss further.

Alexei Starovoitov May 15, 2019, 3:42 a.m. UTC | #15

On Tue, May 14, 2019 at 8:38 PM Stanislav Fomichev <sdf@google.com> wrote:
>
> Take a look at the patches 2-4 in the current series where I convert
> the callers.
>
> (Though, I'd rename xxx_dereference to xxx_rcu_dereference for clarity we
> get to a v2).

please make a fresh repost _after_ bpf-next opens.

Stanislav Fomichev May 15, 2019, 3:49 a.m. UTC | #16

On Tue, May 14, 2019 at 8:42 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, May 14, 2019 at 8:38 PM Stanislav Fomichev <sdf@google.com> wrote:
> >
> > Take a look at the patches 2-4 in the current series where I convert
> > the callers.
> >
> > (Though, I'd rename xxx_dereference to xxx_rcu_dereference for clarity we
> > get to a v2).
>
> please make a fresh repost _after_ bpf-next opens.
Sure, sgtm, thank you for a preliminary review and input!

[bpf,0/4] bpf: remove __rcu annotations from bpf_prog_array

Message

Comments