Message ID | 1364562082.5113.16.camel@edumazet-glaptop |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
On Fri, 2013-03-29 at 06:01 -0700, Eric Dumazet wrote: > From: Eric Dumazet <edumazet@google.com> > > On Fri, 2013-03-29 at 10:48 +0100, Jiri Pirko wrote: > > > Hmm. I think that this might be issue introduced by: > > commit a9b3cd7f323b2e57593e7215362a7b02fc933e3a > > Author: Stephen Hemminger <shemminger@vyatta.com> > > Date: Mon Aug 1 16:19:00 2011 +0000 > > > > rcu: convert uses of rcu_assign_pointer(x, NULL) to RCU_INIT_POINTER > > > > > > Because, if rcu_dereference(dev->rx_handler) is null, > > rcu_dereference(dev->rx_handler_data) is never done. Therefore I believe > > we are hitting following scenario: > > > > > > CPU0 CPU1 > > ---- ---- > > dev->rx_handler_data = NULL > > rcu_read_lock() > > dev->rx_handler = NULL > > > > > > CPU0 will see rx_handler set and yet, rx_handler_data nulled. Write > > barrier in rcu_assign_pointer() might prevent this reorder from happening. > > Therefore I suggest: > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > index 0caa38e..c16b829 100644 > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -3332,8 +3332,8 @@ void netdev_rx_handler_unregister(struct net_device *dev) > > { > > > > ASSERT_RTNL(); > > - RCU_INIT_POINTER(dev->rx_handler, NULL); > > - RCU_INIT_POINTER(dev->rx_handler_data, NULL); > > + rcu_assign_pointer(dev->rx_handler, NULL); > > + rcu_assign_pointer(dev->rx_handler_data, NULL); > > } > > EXPORT_SYMBOL_GPL(netdev_rx_handler_unregister); > > > > > > Nope this changes nothing at all. Exactly! In fact, the bug triggered on an older kernel that had the original rcu_assign_pointer() > > However, we can fix the bug in a different way, if we want to avoid a > test in fast path. > > With following patch, we can make sure that a reader seeing a non NULL > rx_handler has a guarantee to see a non NULL rx_handler_data > [..] > We can fix bug this in two ways. First is adding a test in > bond_handle_frame() and others to check if rx_handler_data is NULL. > > A second way is adding a synchronize_net() in > netdev_rx_handler_unregister() to make sure that a rcu protected reader > has the guarantee to see a non NULL rx_handler_data. > > The second way is better as it avoids an extra test in fast path. > > Reported-by: Steven Rostedt <rostedt@goodmis.org> > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Jiri Pirko <jpirko@redhat.com> > Cc: Paul E. McKenney <paulmck@us.ibm.com> > --- > net/core/dev.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/net/core/dev.c b/net/core/dev.c > index b13e5c7..56932a4 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -3314,6 +3314,7 @@ int netdev_rx_handler_register(struct net_device *dev, > if (dev->rx_handler) > return -EBUSY; > > + /* Note: rx_handler_data must be set before rx_handler */ > rcu_assign_pointer(dev->rx_handler_data, rx_handler_data); > rcu_assign_pointer(dev->rx_handler, rx_handler); > > @@ -3334,6 +3335,11 @@ void netdev_rx_handler_unregister(struct net_device *dev) > > ASSERT_RTNL(); > RCU_INIT_POINTER(dev->rx_handler, NULL); > + /* a reader seeing a non NULL rx_handler in a rcu_read_lock() > + * section has a guarantee to see a non NULL rx_handler_data > + * as well. > + */ > + synchronize_net(); I've thought about this too, but I wasn't sure we wanted two synchronize_*() functions, as the caller does a synchronize as well. That said, I think this is the more robust solution and it lets all rx_handler() functions assume that their rx_handler_data is set. And it removes the check from the fast path which outweighs an added synchronization in the slow path. Acked-by: Steven Rostedt <rostedt@goodmis.org> Thanks! -- Steve > RCU_INIT_POINTER(dev->rx_handler_data, NULL); > } > EXPORT_SYMBOL_GPL(netdev_rx_handler_unregister); > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2013-03-29 at 09:17 -0400, Steven Rostedt wrote: > I've thought about this too, but I wasn't sure we wanted two > synchronize_*() functions, as the caller does a synchronize as well. > That said, I think this is the more robust solution and it lets all > rx_handler() functions assume that their rx_handler_data is set. And it > removes the check from the fast path which outweighs an added > synchronization in the slow path. > Note that I used synchronize_net(), which does a synchronize_rcu_expedited() when RTNL is locked, so its normally quite fast. > Acked-by: Steven Rostedt <rostedt@goodmis.org> > > Thanks! Thanks a lot for your very detailed report and analysis ! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 03/29/2013 02:01 PM, Eric Dumazet wrote: >> CPU0 will see rx_handler set and yet, rx_handler_data nulled. Write >> >barrier in rcu_assign_pointer() might prevent this reorder from happening. >> >Therefore I suggest: >> > >> >diff --git a/net/core/dev.c b/net/core/dev.c >> >index 0caa38e..c16b829 100644 >> >--- a/net/core/dev.c >> >+++ b/net/core/dev.c >> >@@ -3332,8 +3332,8 @@ void netdev_rx_handler_unregister(struct net_device *dev) >> > { >> > >> > ASSERT_RTNL(); >> >- RCU_INIT_POINTER(dev->rx_handler, NULL); >> >- RCU_INIT_POINTER(dev->rx_handler_data, NULL); >> >+ rcu_assign_pointer(dev->rx_handler, NULL); >> >+ rcu_assign_pointer(dev->rx_handler_data, NULL); >> > } >> > EXPORT_SYMBOL_GPL(netdev_rx_handler_unregister); >> > >> > > Nope this changes nothing at all. Erik, why doesn't help the write barrier between the assignments. It should guarantee their orders... or not? Thanks, Ivan -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2013-03-29 at 16:11 +0100, Ivan Vecera wrote: > Erik, why doesn't help the write barrier between the assignments. It > should guarantee their orders... or not? > Its not enough, I wont explain here why as RCU is quite well documented in Documentation/RCU -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Fri, Mar 29, 2013 at 04:38:15PM CET, eric.dumazet@gmail.com wrote: >On Fri, 2013-03-29 at 16:11 +0100, Ivan Vecera wrote: > >> Erik, why doesn't help the write barrier between the assignments. It >> should guarantee their orders... or not? >> > >Its not enough, I wont explain here why as RCU is quite well documented >in Documentation/RCU Can you point me exact paragraph? I'm unable to find that :( > > > >-- >To unsubscribe from this list: send the line "unsubscribe netdev" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2013-03-29 at 17:12 +0100, Jiri Pirko wrote: > Fri, Mar 29, 2013 at 04:38:15PM CET, eric.dumazet@gmail.com wrote: > >On Fri, 2013-03-29 at 16:11 +0100, Ivan Vecera wrote: > > > >> Erik, why doesn't help the write barrier between the assignments. It > >> should guarantee their orders... or not? > >> > > > >Its not enough, I wont explain here why as RCU is quite well documented > >in Documentation/RCU > > Can you point me exact paragraph? I'm unable to find that :( > You need a bit of RCU history to understand the issue rcu_assign_pointer(dev->rx_handler, NULL) is certainly not needing a barrier _before_ setting rx_handler to NULL. Old kernels had this rcu_assign_pointer() implementation since commit d99c4f6b13b3149bc83703ab1493beaeaaaf8a2d (Remove rcu_assign_pointer() penalty for NULL pointers) #define rcu_assign_pointer(p, v) \ ({ \ if (!__builtin_constant_p(v) || \ ((v) != NULL)) \ smp_wmb(); \ (p) = (v); \ }) Note that wmb() was _not_ done if v was NULL Because of various sparse issues, commit d322f45ceed525daa9401154590bbae3222cfefb (rcu: Make rcu_assign_pointer() unconditionally insert a memory barrier) removed the conditional, because RCU_INIT_POINTER() was available. In the rx_handler/rx_handler_data, we use two pointers protected by RCU, but we want to only test rx_hander being NULL, and avoid testing rx_handler_data. Nothing in RCU guarantees that two different pointers have any order. Here is what could happen CPU0 CPU1 handler = rcu_dereference(dev->rx_handler) if (handler) { handler(dev, ...); dev->rx_handler = NULL; smp_wmb(); // OR NOT dev->rx_handler_data = NULL; smp_wmb(); // OR NOT handler(dev) priv_data = rcu_dereference(dev->rx_handler_data); x = priv_data->some_field; // CRASH because priv_data is NULL -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Mar 29, 2013 at 06:01:22AM -0700, Eric Dumazet wrote: > From: Eric Dumazet <edumazet@google.com> > > On Fri, 2013-03-29 at 10:48 +0100, Jiri Pirko wrote: > > > Hmm. I think that this might be issue introduced by: > > commit a9b3cd7f323b2e57593e7215362a7b02fc933e3a > > Author: Stephen Hemminger <shemminger@vyatta.com> > > Date: Mon Aug 1 16:19:00 2011 +0000 > > > > rcu: convert uses of rcu_assign_pointer(x, NULL) to RCU_INIT_POINTER > > > > > > Because, if rcu_dereference(dev->rx_handler) is null, > > rcu_dereference(dev->rx_handler_data) is never done. Therefore I believe > > we are hitting following scenario: > > > > > > CPU0 CPU1 > > ---- ---- > > dev->rx_handler_data = NULL > > rcu_read_lock() > > dev->rx_handler = NULL > > > > > > CPU0 will see rx_handler set and yet, rx_handler_data nulled. Write > > barrier in rcu_assign_pointer() might prevent this reorder from happening. > > Therefore I suggest: > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > index 0caa38e..c16b829 100644 > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -3332,8 +3332,8 @@ void netdev_rx_handler_unregister(struct net_device *dev) > > { > > > > ASSERT_RTNL(); > > - RCU_INIT_POINTER(dev->rx_handler, NULL); > > - RCU_INIT_POINTER(dev->rx_handler_data, NULL); > > + rcu_assign_pointer(dev->rx_handler, NULL); > > + rcu_assign_pointer(dev->rx_handler_data, NULL); > > } > > EXPORT_SYMBOL_GPL(netdev_rx_handler_unregister); > > > > > > Nope this changes nothing at all. > > However, we can fix the bug in a different way, if we want to avoid a > test in fast path. > > With following patch, we can make sure that a reader seeing a non NULL > rx_handler has a guarantee to see a non NULL rx_handler_data > > Thanks > > [PATCH] net: add a synchronize_net() in netdev_rx_handler_unregister() > > commit 35d48903e97819 (bonding: fix rx_handler locking) added a race > in bonding driver, reported by Steven Rostedt who did a very good > diagnosis : > > <quoting Steven> > > I'm currently debugging a crash in an old 3.0-rt kernel that one of our > customers is seeing. The bug happens with a stress test that loads and > unloads the bonding module in a loop (I don't know all the details as > I'm not the one that is directly interacting with the customer). But the > bug looks to be something that may still be present and possibly present > in mainline too. It will just be much harder to trigger it in mainline. > > In -rt, interrupts are threads, and can schedule in and out just like > any other thread. Note, mainline now supports interrupt threads so this > may be easily reproducible in mainline as well. I don't have the ability > to tell the customer to try mainline or other kernels, so my hands are > somewhat tied to what I can do. > > But according to a core dump, I tracked down that the eth irq thread > crashed in bond_handle_frame() here: > > slave = bond_slave_get_rcu(skb->dev); > bond = slave->bond; <--- BUG > > > the slave returned was NULL and accessing slave->bond caused a NULL > pointer dereference. > > Looking at the code that unregisters the handler: > > void netdev_rx_handler_unregister(struct net_device *dev) > { > > ASSERT_RTNL(); > RCU_INIT_POINTER(dev->rx_handler, NULL); > RCU_INIT_POINTER(dev->rx_handler_data, NULL); > } > > Which is basically: > dev->rx_handler = NULL; > dev->rx_handler_data = NULL; > > And looking at __netif_receive_skb() we have: > > rx_handler = rcu_dereference(skb->dev->rx_handler); > if (rx_handler) { > if (pt_prev) { > ret = deliver_skb(skb, pt_prev, orig_dev); > pt_prev = NULL; > } > switch (rx_handler(&skb)) { > > My question to all of you is, what stops this interrupt from happening > while the bonding module is unloading? What happens if the interrupt > triggers and we have this: > > > CPU0 CPU1 > ---- ---- > rx_handler = skb->dev->rx_handler > > netdev_rx_handler_unregister() { > dev->rx_handler = NULL; > dev->rx_handler_data = NULL; > > rx_handler() > bond_handle_frame() { > slave = skb->dev->rx_handler; > bond = slave->bond; <-- NULL pointer dereference!!! > > > What protection am I missing in the bond release handler that would > prevent the above from happening? > > </quoting Steven> > > We can fix bug this in two ways. First is adding a test in > bond_handle_frame() and others to check if rx_handler_data is NULL. > > A second way is adding a synchronize_net() in > netdev_rx_handler_unregister() to make sure that a rcu protected reader > has the guarantee to see a non NULL rx_handler_data. > > The second way is better as it avoids an extra test in fast path. > > Reported-by: Steven Rostedt <rostedt@goodmis.org> > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Jiri Pirko <jpirko@redhat.com> > Cc: Paul E. McKenney <paulmck@us.ibm.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> With kudos to Steven Rostedt for his analogy between RCU and Schrödinger's cat. ;-) > --- > net/core/dev.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/net/core/dev.c b/net/core/dev.c > index b13e5c7..56932a4 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -3314,6 +3314,7 @@ int netdev_rx_handler_register(struct net_device *dev, > if (dev->rx_handler) > return -EBUSY; > > + /* Note: rx_handler_data must be set before rx_handler */ > rcu_assign_pointer(dev->rx_handler_data, rx_handler_data); > rcu_assign_pointer(dev->rx_handler, rx_handler); > > @@ -3334,6 +3335,11 @@ void netdev_rx_handler_unregister(struct net_device *dev) > > ASSERT_RTNL(); > RCU_INIT_POINTER(dev->rx_handler, NULL); > + /* a reader seeing a non NULL rx_handler in a rcu_read_lock() > + * section has a guarantee to see a non NULL rx_handler_data > + * as well. > + */ > + synchronize_net(); > RCU_INIT_POINTER(dev->rx_handler_data, NULL); > } > EXPORT_SYMBOL_GPL(netdev_rx_handler_unregister); > > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2013-03-29 at 12:20 -0700, Paul E. McKenney wrote: > Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > With kudos to Steven Rostedt for his analogy between RCU and > Schrödinger's cat. ;-) Thanks Paul ! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Date: Fri, 29 Mar 2013 12:20:11 -0700 > On Fri, Mar 29, 2013 at 06:01:22AM -0700, Eric Dumazet wrote: >> [PATCH] net: add a synchronize_net() in netdev_rx_handler_unregister() >> >> commit 35d48903e97819 (bonding: fix rx_handler locking) added a race >> in bonding driver, reported by Steven Rostedt who did a very good >> diagnosis : ... >> We can fix bug this in two ways. First is adding a test in >> bond_handle_frame() and others to check if rx_handler_data is NULL. >> >> A second way is adding a synchronize_net() in >> netdev_rx_handler_unregister() to make sure that a rcu protected reader >> has the guarantee to see a non NULL rx_handler_data. >> >> The second way is better as it avoids an extra test in fast path. >> >> Reported-by: Steven Rostedt <rostedt@goodmis.org> >> Signed-off-by: Eric Dumazet <edumazet@google.com> >> Cc: Jiri Pirko <jpirko@redhat.com> >> Cc: Paul E. McKenney <paulmck@us.ibm.com> > > Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> > > With kudos to Steven Rostedt for his analogy between RCU and > Schrödinger's cat. ;-) Applied and queued up for -stable, thanks Everyone. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/core/dev.c b/net/core/dev.c index b13e5c7..56932a4 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3314,6 +3314,7 @@ int netdev_rx_handler_register(struct net_device *dev, if (dev->rx_handler) return -EBUSY; + /* Note: rx_handler_data must be set before rx_handler */ rcu_assign_pointer(dev->rx_handler_data, rx_handler_data); rcu_assign_pointer(dev->rx_handler, rx_handler); @@ -3334,6 +3335,11 @@ void netdev_rx_handler_unregister(struct net_device *dev) ASSERT_RTNL(); RCU_INIT_POINTER(dev->rx_handler, NULL); + /* a reader seeing a non NULL rx_handler in a rcu_read_lock() + * section has a guarantee to see a non NULL rx_handler_data + * as well. + */ + synchronize_net(); RCU_INIT_POINTER(dev->rx_handler_data, NULL); } EXPORT_SYMBOL_GPL(netdev_rx_handler_unregister);