Message ID | cover.1507294365.git.pabeni@redhat.com |
---|---|
Headers | show |
Series | RCU: introduce noref debug | expand |
On Fri, Oct 06, 2017 at 02:57:45PM +0200, Paolo Abeni wrote: > The networking subsystem is currently using some kind of long-lived > RCU-protected, references to avoid the overhead of full book-keeping. > > Such references - skb_dst() noref - are stored inside the skbs and can be > moved across relevant slices of the network stack, with the users > being in charge of properly clearing the relevant skb - or properly refcount > the related dst references - before the skb escapes the RCU section. > > We currently don't have any deterministic debug infrastructure to check > the dst noref usages - and the introduction of others noref artifact is > currently under discussion. > > This series tries to tackle the above introducing an RCU debug infrastructure > aimed at spotting incorrect noref pointer usage, in patch one. The > infrastructure is small and must be explicitly enabled via a newly introduced > build option. > > Patch two uses such infrastructure to track dst noref usage in the networking > stack. > > Patch 3 and 4 are bugfixes for small buglet found running this infrastructure > on basic scenarios. This patchset does not look like it handles rcu_read_lock() nesting. For example, given code like this: void foo(void) { rcu_read_lock(); rcu_track_noref(&key2, &noref2, true); do_something(); rcu_track_noref(&key2, &noref2, false); rcu_read_unlock(); } void bar(void) { rcu_read_lock(); rcu_track_noref(&key1, &noref1, true); do_something_more(); foo(); do_something_else(); rcu_track_noref(&key1, &noref1, false); rcu_read_unlock(); } void grill(void) { foo(); } It looks like foo()'s rcu_read_unlock() will complain about key1. You could remove foo()'s rcu_read_lock() and rcu_read_unlock(), but that will break the call from grill(). Or am I missing something subtle here? Given patch 3/4, I suspect not... Thanx, Paul > Paolo Abeni (4): > rcu: introduce noref debug > net: use RCU noref infrastructure to track dst noref > ipv4: drop unneeded and misleading RCU lock in ip_route_input_noref() > tcp: avoid noref dst leak on input path > > include/linux/rcupdate.h | 11 ++++++ > include/linux/skbuff.h | 1 + > include/net/dst.h | 5 +++ > kernel/rcu/Kconfig.debug | 15 ++++++++ > kernel/rcu/Makefile | 1 + > kernel/rcu/noref_debug.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++++ > net/ipv4/route.c | 7 +--- > net/ipv4/tcp_input.c | 5 ++- > 8 files changed, 127 insertions(+), 7 deletions(-) > create mode 100644 kernel/rcu/noref_debug.c > > -- > 2.13.6 >
Hi, On Fri, 2017-10-06 at 06:34 -0700, Paul E. McKenney wrote: > On Fri, Oct 06, 2017 at 02:57:45PM +0200, Paolo Abeni wrote: > > The networking subsystem is currently using some kind of long-lived > > RCU-protected, references to avoid the overhead of full book-keeping. > > > > Such references - skb_dst() noref - are stored inside the skbs and can be > > moved across relevant slices of the network stack, with the users > > being in charge of properly clearing the relevant skb - or properly refcount > > the related dst references - before the skb escapes the RCU section. > > > > We currently don't have any deterministic debug infrastructure to check > > the dst noref usages - and the introduction of others noref artifact is > > currently under discussion. > > > > This series tries to tackle the above introducing an RCU debug infrastructure > > aimed at spotting incorrect noref pointer usage, in patch one. The > > infrastructure is small and must be explicitly enabled via a newly introduced > > build option. > > > > Patch two uses such infrastructure to track dst noref usage in the networking > > stack. > > > > Patch 3 and 4 are bugfixes for small buglet found running this infrastructure > > on basic scenarios. Thank you for the prompt reply! > > This patchset does not look like it handles rcu_read_lock() nesting. > For example, given code like this: > > void foo(void) > { > rcu_read_lock(); > rcu_track_noref(&key2, &noref2, true); > do_something(); > rcu_track_noref(&key2, &noref2, false); > rcu_read_unlock(); > } > > void bar(void) > { > rcu_read_lock(); > rcu_track_noref(&key1, &noref1, true); > do_something_more(); > foo(); > do_something_else(); > rcu_track_noref(&key1, &noref1, false); > rcu_read_unlock(); > } > > void grill(void) > { > foo(); > } > > It looks like foo()'s rcu_read_unlock() will complain about key1. > You could remove foo()'s rcu_read_lock() and rcu_read_unlock(), but > that will break the call from grill(). Actually the code should cope correctly with your example; when foo()'s rcu_read_unlock() is called, 'cache' contains: { { &key1, &noref1, 1}, // ... and when the related __rcu_check_noref() is invoked preempt_count() is 2 - because the check is called before decreasing the preempt counter. In the main loop inside __rcu_check_noref() we will hit always the 'continue' statement because 'cache->store[i].nesting != nesting', so no warn will be triggered. > Or am I missing something subtle here? Given patch 3/4, I suspect not... The problem with the code in patch 3/4 is different; currently ip_route_input_noref() is basically doing: rcu_read_lock(); rcu_track_noref(&key1, &noref1, true); rcu_read_unlock(); So the rcu lock there silence any RCU based check inside ip_route_input_noref() but does not really protect the noref dst. Please let me know if the above clarify the scenario. Thanks, Paolo
On Fri, Oct 06, 2017 at 05:10:09PM +0200, Paolo Abeni wrote: > Hi, > > On Fri, 2017-10-06 at 06:34 -0700, Paul E. McKenney wrote: > > On Fri, Oct 06, 2017 at 02:57:45PM +0200, Paolo Abeni wrote: > > > The networking subsystem is currently using some kind of long-lived > > > RCU-protected, references to avoid the overhead of full book-keeping. > > > > > > Such references - skb_dst() noref - are stored inside the skbs and can be > > > moved across relevant slices of the network stack, with the users > > > being in charge of properly clearing the relevant skb - or properly refcount > > > the related dst references - before the skb escapes the RCU section. > > > > > > We currently don't have any deterministic debug infrastructure to check > > > the dst noref usages - and the introduction of others noref artifact is > > > currently under discussion. > > > > > > This series tries to tackle the above introducing an RCU debug infrastructure > > > aimed at spotting incorrect noref pointer usage, in patch one. The > > > infrastructure is small and must be explicitly enabled via a newly introduced > > > build option. > > > > > > Patch two uses such infrastructure to track dst noref usage in the networking > > > stack. > > > > > > Patch 3 and 4 are bugfixes for small buglet found running this infrastructure > > > on basic scenarios. > > Thank you for the prompt reply! > > > > This patchset does not look like it handles rcu_read_lock() nesting. > > For example, given code like this: > > > > void foo(void) > > { > > rcu_read_lock(); > > rcu_track_noref(&key2, &noref2, true); > > do_something(); > > rcu_track_noref(&key2, &noref2, false); > > rcu_read_unlock(); > > } > > > > void bar(void) > > { > > rcu_read_lock(); > > rcu_track_noref(&key1, &noref1, true); > > do_something_more(); > > foo(); > > do_something_else(); > > rcu_track_noref(&key1, &noref1, false); > > rcu_read_unlock(); > > } > > > > void grill(void) > > { > > foo(); > > } > > > > It looks like foo()'s rcu_read_unlock() will complain about key1. > > You could remove foo()'s rcu_read_lock() and rcu_read_unlock(), but > > that will break the call from grill(). > > Actually the code should cope correctly with your example; when foo()'s > rcu_read_unlock() is called, 'cache' contains: > > { { &key1, &noref1, 1}, // ... > > and when the related __rcu_check_noref() is invoked preempt_count() is > 2 - because the check is called before decreasing the preempt counter. > > In the main loop inside __rcu_check_noref() we will hit always the > 'continue' statement because 'cache->store[i].nesting != nesting', so > no warn will be triggered. You are right, it was too early, and my example wasn't correct. How about this one? void foo(void (*f)(struct s *sp), struct s **spp) { rcu_read_lock(); rcu_track_noref(&key2, &noref2, true); f(spp); rcu_track_noref(&key2, &noref2, false); rcu_read_unlock(); } void barcb(struct s **spp) { *spp = &noref3; rcu_track_noref(&key3, *spp, true); } void bar(void) { struct s *sp; rcu_read_lock(); rcu_track_noref(&key1, &noref1, true); do_something_more(); foo(barcb, &sp); do_something_else(sp); rcu_track_noref(&key3, sp, false); rcu_track_noref(&key1, &noref1, false); rcu_read_unlock(); } void grillcb(struct s **spp) { *spp } void grill(void) { foo(); } How does the user select the key argument? It looks like someone can choose to just pass in NULL -- is that the intent, or am I confused about this as well? > > Or am I missing something subtle here? Given patch 3/4, I suspect not... > > The problem with the code in patch 3/4 is different; currently > ip_route_input_noref() is basically doing: > > rcu_read_lock(); > > rcu_track_noref(&key1, &noref1, true); > > rcu_read_unlock(); > > So the rcu lock there silence any RCU based check inside > ip_route_input_noref() but does not really protect the noref dst. > > Please let me know if the above clarify the scenario. OK. I am also concerned about false negatives when the user invokes rcu_track_noref(..., false) but then leaks the pointer anyway. Or is there something you are doing that catches this that I am missing? Thanx, Paul
On Fri, 2017-10-06 at 09:34 -0700, Paul E. McKenney wrote: > On Fri, Oct 06, 2017 at 05:10:09PM +0200, Paolo Abeni wrote: > > Hi, > > > > On Fri, 2017-10-06 at 06:34 -0700, Paul E. McKenney wrote: > > > On Fri, Oct 06, 2017 at 02:57:45PM +0200, Paolo Abeni wrote: > > > > The networking subsystem is currently using some kind of long-lived > > > > RCU-protected, references to avoid the overhead of full book-keeping. > > > > > > > > Such references - skb_dst() noref - are stored inside the skbs and can be > > > > moved across relevant slices of the network stack, with the users > > > > being in charge of properly clearing the relevant skb - or properly refcount > > > > the related dst references - before the skb escapes the RCU section. > > > > > > > > We currently don't have any deterministic debug infrastructure to check > > > > the dst noref usages - and the introduction of others noref artifact is > > > > currently under discussion. > > > > > > > > This series tries to tackle the above introducing an RCU debug infrastructure > > > > aimed at spotting incorrect noref pointer usage, in patch one. The > > > > infrastructure is small and must be explicitly enabled via a newly introduced > > > > build option. > > > > > > > > Patch two uses such infrastructure to track dst noref usage in the networking > > > > stack. > > > > > > > > Patch 3 and 4 are bugfixes for small buglet found running this infrastructure > > > > on basic scenarios. > > > > Thank you for the prompt reply! > > > > > > This patchset does not look like it handles rcu_read_lock() nesting. > > > For example, given code like this: > > > > > > void foo(void) > > > { > > > rcu_read_lock(); > > > rcu_track_noref(&key2, &noref2, true); > > > do_something(); > > > rcu_track_noref(&key2, &noref2, false); > > > rcu_read_unlock(); > > > } > > > > > > void bar(void) > > > { > > > rcu_read_lock(); > > > rcu_track_noref(&key1, &noref1, true); > > > do_something_more(); > > > foo(); > > > do_something_else(); > > > rcu_track_noref(&key1, &noref1, false); > > > rcu_read_unlock(); > > > } > > > > > > void grill(void) > > > { > > > foo(); > > > } > > > > > > It looks like foo()'s rcu_read_unlock() will complain about key1. > > > You could remove foo()'s rcu_read_lock() and rcu_read_unlock(), but > > > that will break the call from grill(). > > > > Actually the code should cope correctly with your example; when foo()'s > > rcu_read_unlock() is called, 'cache' contains: > > > > { { &key1, &noref1, 1}, // ... > > > > and when the related __rcu_check_noref() is invoked preempt_count() is > > 2 - because the check is called before decreasing the preempt counter. > > > > In the main loop inside __rcu_check_noref() we will hit always the > > 'continue' statement because 'cache->store[i].nesting != nesting', so > > no warn will be triggered. > > You are right, it was too early, and my example wasn't correct. How > about this one? > > void foo(void (*f)(struct s *sp), struct s **spp) > { > rcu_read_lock(); > rcu_track_noref(&key2, &noref2, true); > f(spp); > rcu_track_noref(&key2, &noref2, false); > rcu_read_unlock(); > } > > void barcb(struct s **spp) > { > *spp = &noref3; > rcu_track_noref(&key3, *spp, true); > } > > void bar(void) > { > struct s *sp; > > rcu_read_lock(); > rcu_track_noref(&key1, &noref1, true); > do_something_more(); > foo(barcb, &sp); > do_something_else(sp); > rcu_track_noref(&key3, sp, false); > rcu_track_noref(&key1, &noref1, false); > rcu_read_unlock(); > } > > void grillcb(struct s **spp) > { > *spp > } > > void grill(void) > { > foo(); > } You are right: this will generate a splat, even if the code it safe. The false positive can be avoided looking for leaked references only in the outermost rcu unlook. I did a previous implementation performing such check, but it emitted very generic splat so I tried to be more strict. The latter choice allowed to find/do 3/4. What about using save_stack_trace() in rcu_track_noref(, true) and reporting such stack trace when the check in the outer most rcu fails? the current strict/false-postive-prone check could be enabled under an additional build flag. > How does the user select the key argument? It looks like someone > can choose to just pass in NULL -- is that the intent, or am I confused > about this as well? The 'key' argument is intented to discriminate the scope of the same noref dst attached to different skbs, which happens e.g. as a result of as skb_dst_copy(). In a generic use case, it can be NULL, too. > I am also concerned about false negatives when the user invokes > rcu_track_noref(..., false) but then leaks the pointer anyway. Or is > there something you are doing that catches this that I am missing? If the rcu_track_noref(..., false) is misplaced or missing, yes we can have false negative. In the noref rcu use-case the rcu_track_noref(, false) call is/must be placed side-by-side with the code clearing the the noref pointer, so is/should be quite easy avoiding such mistakes. Cheers, Paolo
On Mon, Oct 09, 2017 at 06:53:12PM +0200, Paolo Abeni wrote: > On Fri, 2017-10-06 at 09:34 -0700, Paul E. McKenney wrote: > > On Fri, Oct 06, 2017 at 05:10:09PM +0200, Paolo Abeni wrote: > > > Hi, > > > > > > On Fri, 2017-10-06 at 06:34 -0700, Paul E. McKenney wrote: > > > > On Fri, Oct 06, 2017 at 02:57:45PM +0200, Paolo Abeni wrote: > > > > > The networking subsystem is currently using some kind of long-lived > > > > > RCU-protected, references to avoid the overhead of full book-keeping. > > > > > > > > > > Such references - skb_dst() noref - are stored inside the skbs and can be > > > > > moved across relevant slices of the network stack, with the users > > > > > being in charge of properly clearing the relevant skb - or properly refcount > > > > > the related dst references - before the skb escapes the RCU section. > > > > > > > > > > We currently don't have any deterministic debug infrastructure to check > > > > > the dst noref usages - and the introduction of others noref artifact is > > > > > currently under discussion. > > > > > > > > > > This series tries to tackle the above introducing an RCU debug infrastructure > > > > > aimed at spotting incorrect noref pointer usage, in patch one. The > > > > > infrastructure is small and must be explicitly enabled via a newly introduced > > > > > build option. > > > > > > > > > > Patch two uses such infrastructure to track dst noref usage in the networking > > > > > stack. > > > > > > > > > > Patch 3 and 4 are bugfixes for small buglet found running this infrastructure > > > > > on basic scenarios. > > > > > > Thank you for the prompt reply! > > > > > > > > This patchset does not look like it handles rcu_read_lock() nesting. > > > > For example, given code like this: > > > > > > > > void foo(void) > > > > { > > > > rcu_read_lock(); > > > > rcu_track_noref(&key2, &noref2, true); > > > > do_something(); > > > > rcu_track_noref(&key2, &noref2, false); > > > > rcu_read_unlock(); > > > > } > > > > > > > > void bar(void) > > > > { > > > > rcu_read_lock(); > > > > rcu_track_noref(&key1, &noref1, true); > > > > do_something_more(); > > > > foo(); > > > > do_something_else(); > > > > rcu_track_noref(&key1, &noref1, false); > > > > rcu_read_unlock(); > > > > } > > > > > > > > void grill(void) > > > > { > > > > foo(); > > > > } > > > > > > > > It looks like foo()'s rcu_read_unlock() will complain about key1. > > > > You could remove foo()'s rcu_read_lock() and rcu_read_unlock(), but > > > > that will break the call from grill(). > > > > > > Actually the code should cope correctly with your example; when foo()'s > > > rcu_read_unlock() is called, 'cache' contains: > > > > > > { { &key1, &noref1, 1}, // ... > > > > > > and when the related __rcu_check_noref() is invoked preempt_count() is > > > 2 - because the check is called before decreasing the preempt counter. > > > > > > In the main loop inside __rcu_check_noref() we will hit always the > > > 'continue' statement because 'cache->store[i].nesting != nesting', so > > > no warn will be triggered. > > > > You are right, it was too early, and my example wasn't correct. How > > about this one? > > > > void foo(void (*f)(struct s *sp), struct s **spp) > > { > > rcu_read_lock(); > > rcu_track_noref(&key2, &noref2, true); > > f(spp); > > rcu_track_noref(&key2, &noref2, false); > > rcu_read_unlock(); > > } > > > > void barcb(struct s **spp) > > { > > *spp = &noref3; > > rcu_track_noref(&key3, *spp, true); > > } > > > > void bar(void) > > { > > struct s *sp; > > > > rcu_read_lock(); > > rcu_track_noref(&key1, &noref1, true); > > do_something_more(); > > foo(barcb, &sp); > > do_something_else(sp); > > rcu_track_noref(&key3, sp, false); > > rcu_track_noref(&key1, &noref1, false); > > rcu_read_unlock(); > > } > > > > void grillcb(struct s **spp) > > { > > *spp > > } > > > > void grill(void) > > { > > foo(); > > } > > You are right: this will generate a splat, even if the code it safe. > The false positive can be avoided looking for leaked references only in > the outermost rcu unlook. I did a previous implementation performing > such check, but it emitted very generic splat so I tried to be more > strict. The latter choice allowed to find/do 3/4. > > What about using save_stack_trace() in rcu_track_noref(, true) and > reporting such stack trace when the check in the outer most rcu fails? > > the current strict/false-postive-prone check could be enabled under an > additional build flag. Linus and Ingo will ask me how users decide how they should set that additional build flag. Especially given that if there is code that requires non-strict checking, isn't everyone required to set up non-strict checking to avoid false positives? Unless you can also configure out all the code that requires non-strict checking, I suppose, but how would you keep track of what to configure out? > > How does the user select the key argument? It looks like someone > > can choose to just pass in NULL -- is that the intent, or am I confused > > about this as well? > > The 'key' argument is intented to discriminate the scope of the same > noref dst attached to different skbs, which happens e.g. as a result of > as skb_dst_copy(). > > In a generic use case, it can be NULL, too. OK. There will probably be some discussion about the API in that case. > > I am also concerned about false negatives when the user invokes > > rcu_track_noref(..., false) but then leaks the pointer anyway. Or is > > there something you are doing that catches this that I am missing? > > If the rcu_track_noref(..., false) is misplaced or missing, yes we can > have false negative. > > In the noref rcu use-case the rcu_track_noref(, false) call is/must be > placed side-by-side with the code clearing the the noref pointer, so > is/should be quite easy avoiding such mistakes. True enough. Except that if people were good about always clearing the pointer, then the pointer couldn't leak, right? Or am I missing something in your use cases? Or to put it another way -- have you been able to catch any real pointer-leak bugs with this? The other RCU debug options have had pretty long found-bug lists before we accepted them. Thanx, Paul
On Tue, 2017-10-10 at 21:02 -0700, Paul E. McKenney wrote: > Linus and Ingo will ask me how users decide how they should set that > additional build flag. Especially given that if there is code that > requires non-strict checking, isn't everyone required to set up non-strict > checking to avoid false positives? Unless you can also configure out > all the code that requires non-strict checking, I suppose, but how > would you keep track of what to configure out? I'm working to a new version using a single compile flag - without additional strict option. I don't know of any other subsytem that stores rcu pointer in datastructures for a longer amount of time. That having said, I wonder if the tests should completely move to the networking subsystem for the time being. The Kconfig option would thus be called NET_DEBUG or something along the lines. For abstraction it would be possible to add an atomic_notifier_chain to the rcu_read/unlock methods, where multiple users or checkers could register for. That way we keep the users seperate from the implementation for the cost of a bit more layering and more indirect calls. But given that this will anyway slow down execution a lot, it will anyway only be suitable in testing/verification/debugging environments. > OK. There will probably be some discussion about the API in that case. I'll drop noref parameter, the key will became mandatory - the exact position of where the reference of RCU managed object is stored. In the case of noref dst it is &skb->_skb_refdst. With this kind of API it should suite more subsystems. > True enough. Except that if people were good about always clearing the > pointer, then the pointer couldn't leak, right? Or am I missing something > in your use cases? This is correct. The dst_entry checking in skb, which this patch series implements there are strict brackets in the sense of skb_dst_set, skb_dst_set_noref, skb_dst_force, etc., which form brackets around the safe uses of those dst_entries. This patch series validates that the correct skb_dst_* functions are being called before the sk_buff leaves the rcu protected section. Thus we don't need to modify and review a lot of code but we can just patch into those helpers already. > Or to put it another way -- have you been able to catch any real > pointer-leak bugs with thister-leak bugs with this? The other RCU > debug options have had pretty long found-bug lists before we accepted > them. There have been two problems found so far, one is a rather minor one while the other one seems like a normal bug. The patches for those are part of this series (3/4 and 4/4). Regards, Paolo
On Wed, Oct 11, 2017 at 04:50:36PM +0200, Paolo Abeni wrote: > On Tue, 2017-10-10 at 21:02 -0700, Paul E. McKenney wrote: > > Linus and Ingo will ask me how users decide how they should set that > > additional build flag. Especially given that if there is code that > > requires non-strict checking, isn't everyone required to set up non-strict > > checking to avoid false positives? Unless you can also configure out > > all the code that requires non-strict checking, I suppose, but how > > would you keep track of what to configure out? > > I'm working to a new version using a single compile flag - without > additional strict option. > > I don't know of any other subsytem that stores rcu pointer in > datastructures for a longer amount of time. That having said, I wonder > if the tests should completely move to the networking subsystem for the > time being. The Kconfig option would thus be called NET_DEBUG or > something along the lines. For abstraction it would be possible to add > an atomic_notifier_chain to the rcu_read/unlock methods, where multiple > users or checkers could register for. That way we keep the users > seperate from the implementation for the cost of a bit more layering > and more indirect calls. But given that this will anyway slow down > execution a lot, it will anyway only be suitable in > testing/verification/debugging environments. I like this approach. And if it does a good job of finding networking bugs over the next year or so, I would be quite happy to consider something for all RCU users. > > OK. There will probably be some discussion about the API in that case. > > I'll drop noref parameter, the key will became mandatory - the exact > position of where the reference of RCU managed object is stored. In the > case of noref dst it is &skb->_skb_refdst. With this kind of API it > should suite more subsystems. Interesting. Do you intend to allow rcu_track_noref() to refuse to record a pointer? Other than for the array-full case. > > True enough. Except that if people were good about always clearing the > > pointer, then the pointer couldn't leak, right? Or am I missing something > > in your use cases? > > This is correct. The dst_entry checking in skb, which this patch series > implements there are strict brackets in the sense of skb_dst_set, > skb_dst_set_noref, skb_dst_force, etc., which form brackets around the > safe uses of those dst_entries. This patch series validates that the > correct skb_dst_* functions are being called before the sk_buff leaves > the rcu protected section. Thus we don't need to modify and review a > lot of code but we can just patch into those helpers already. Makes sense. Those willing to strictly bracket uses gain some debug assist. > > Or to put it another way -- have you been able to catch any real > > pointer-leak bugs with thister-leak bugs with this? The other RCU > > debug options have had pretty long found-bug lists before we accepted > > them. > > There have been two problems found so far, one is a rather minor one > while the other one seems like a normal bug. The patches for those are > part of this series (3/4 and 4/4). I agree that you have started gathering evidence and that the early indications are positive, if quite mildly so. If this time next year there are a few tens of such bugs found, preferably including some serious ones, I would very likely look favorably on pulling this in to allow others to use it. Thanx, Paul