diff mbox

[BUG] Crash with NULL pointer dereference in bond_handle_frame in -rt (possibly mainline)

Message ID 20130329094856.GB1677@minipsycho.orion
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Jiri Pirko March 29, 2013, 9:48 a.m. UTC
Thu, Mar 28, 2013 at 06:29:52PM CET, eric.dumazet@gmail.com wrote:
>On Thu, 2013-03-28 at 13:16 -0400, Steven Rostedt wrote:
>> Hi,
>> 
>> I'm currently debugging a crash in an old 3.0-rt kernel that one of our
>> customers is seeing. The bug happens with a stress test that loads and
>> unloads the bonding module in a loop (I don't know all the details as
>> I'm not the one that is directly interacting with the customer). But the
>> bug looks to be something that may still be present and possibly present
>> in mainline too. It will just be much harder to trigger it in mainline.
>> 
>> In -rt, interrupts are threads, and can schedule in and out just like
>> any other thread. Note, mainline now supports interrupt threads so this
>> may be easily reproducible in mainline as well. I don't have the ability
>> to tell the customer to try mainline or other kernels, so my hands are
>> somewhat tied to what I can do.
>> 
>> But according to a core dump, I tracked down that the eth irq thread
>> crashed in bond_handle_frame() here:
>> 
>> 	slave = bond_slave_get_rcu(skb->dev);
>> 	bond = slave->bond; <--- BUG
>> 
>> 
>> the slave returned was NULL and accessing slave->bond caused a NULL
>> pointer dereference.
>> 
>> Looking at the code that unregisters the handler:
>> 
>> void netdev_rx_handler_unregister(struct net_device *dev)
>> {
>> 
>>         ASSERT_RTNL();
>>         RCU_INIT_POINTER(dev->rx_handler, NULL);
>>         RCU_INIT_POINTER(dev->rx_handler_data, NULL);
>> }
>> 
>> Which is basically:
>> 	dev->rx_handler = NULL;
>> 	dev->rx_handler_data = NULL;
>> 
>> And looking at __netif_receive_skb() we have:
>> 
>>         rx_handler = rcu_dereference(skb->dev->rx_handler);
>>         if (rx_handler) {
>>                 if (pt_prev) {
>>                         ret = deliver_skb(skb, pt_prev, orig_dev);
>>                         pt_prev = NULL;
>>                 }
>>                 switch (rx_handler(&skb)) {
>> 
>> My question to all of you is, what stops this interrupt from happening
>> while the bonding module is unloading?  What happens if the interrupt
>> triggers and we have this:
>> 
>> 
>> 	CPU0			CPU1
>> 	----			----
>>   rx_handler = skb->dev->rx_handler
>> 
>> 			netdev_rx_handler_unregister() {
>> 			   dev->rx_handler = NULL;
>> 			   dev->rx_handler_data = NULL;
>> 
>>   rx_handler()
>>    bond_handle_frame() {
>>     slave = skb->dev->rx_handler;
>>     bond = slave->bond; <-- NULL pointer dereference!!!
>> 
>> 
>> What protection am I missing in the bond release handler that would
>> prevent the above from happening?


Hmm. I think that this might be issue introduced by:
commit a9b3cd7f323b2e57593e7215362a7b02fc933e3a
Author: Stephen Hemminger <shemminger@vyatta.com>
Date:   Mon Aug 1 16:19:00 2011 +0000

    rcu: convert uses of rcu_assign_pointer(x, NULL) to RCU_INIT_POINTER


Because, if rcu_dereference(dev->rx_handler) is null,
rcu_dereference(dev->rx_handler_data) is never done. Therefore I believe
we are hitting following scenario:


   CPU0				CPU1
   ----				----
  			    dev->rx_handler_data = NULL
 rcu_read_lock()
 			    dev->rx_handler = NULL


CPU0 will see rx_handler set and yet, rx_handler_data nulled. Write
barrier in rcu_assign_pointer() might prevent this reorder from happening.
Therefore I suggest:


>
>Nothing :(
>
>bug introduced in commit 35d48903e9781975e823b359ee85c257c9ff5c1c
>(bonding: fix rx_handler locking)
>
>CC Jiri
>
>Fix seems simple :
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 6bbd90e..7956ca5 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -1457,6 +1457,8 @@ static rx_handler_result_t bond_handle_frame(struct sk_buff **pskb)
> 	*pskb = skb;
> 
> 	slave = bond_slave_get_rcu(skb->dev);
>+	if (!slave)
>+		return ret;
> 	bond = slave->bond;
> 
> 	if (bond->params.arp_interval)
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Stephen Hemminger March 29, 2013, 3:46 p.m. UTC | #1
On Fri, 29 Mar 2013 10:48:56 +0100
Jiri Pirko <jpirko@redhat.com> wrote:

> index 0caa38e..c16b829 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3332,8 +3332,8 @@ void netdev_rx_handler_unregister(struct net_device *dev)
>  {
>  
>  	ASSERT_RTNL();
> -	RCU_INIT_POINTER(dev->rx_handler, NULL);
> -	RCU_INIT_POINTER(dev->rx_handler_data, NULL);
> +	rcu_assign_pointer(dev->rx_handler, NULL);
> +	rcu_assign_pointer(dev->rx_handler_data, NULL);
>  }
>  EXPORT_SYMBOL_GPL(netdev_rx_handler_unregister);
It is worth noting that at the time rcu_assign_pointer() had a special
case tat if the value was NULL it would compile into RCU_INIT_POINTER without
the barrier.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steven Rostedt March 29, 2013, 6:36 p.m. UTC | #2
On Fri, 2013-03-29 at 10:48 +0100, Jiri Pirko wrote:

> Because, if rcu_dereference(dev->rx_handler) is null,
> rcu_dereference(dev->rx_handler_data) is never done. Therefore I believe
> we are hitting following scenario:
> 
> 
>    CPU0				CPU1
>    ----				----
>   			    dev->rx_handler_data = NULL
>  rcu_read_lock()
>  			    dev->rx_handler = NULL
> 
> 

That is not what is happening and that is not how RCU works. That is,
rcu_read_lock() does not block nor does it really do much with ordering
at all.

The problem is totally contained within the rcu_read_lock() as well:


If you have:

	rcu_read_lock();
	rx_handler = dev->rx_handler;
	rx_handler();
	rcu_read_unlock();

where rx_handler references rx->rx_handler_data you need much more than
making sure that rx->handler is set to null before rx_handler_data.

The way RCU works is it lets things exist in a "dual state". Kind of
like a Schödinger's cat. The solution Eric posted is a classic RCU
example of how this works.

When you set dev->rx_handler to NULL, there's two states that currently
exist in the system. Those that still see dev->rx_handler set to
something and those that see it set to NULL. You could put in memory
barriers to your hearts content, but you will still have a system that
sees things in a dual state. If you set dev->rx_handler_data to NULL,
you risk those that see rx_handler as a function can still reference the
rx_handler_data when it is NULL.

Think of it this way:

	dev->rx_handler() {

Once the function has been called, even if you set rx_handler() to NULL
at this point, it makes no difference, even with memory barriers. This
CPU is about to execute the previous value of rx_handler and there's
nothing you can do to stop it. Setting rx_handler_data to NULL now can
cause that CPU to reference the NULL pointer. There isn't a ordering
problem where rx_handler_data got set to NULL first.

But the beauty about RCU is the synchronize_*() functions, because that
puts the system back into a single state. After the synchronization is
complete, the entire system sees rx_handler() as NULL. There is no worry
about setting rx_handler_data to NULL now because nothing will be
referencing the previous value of rx_handler because that value no
longer exists in the system.

That means Eric's solution fits perfectly well here.

	< system in single state : everyone sees rx_handler = function() >

	rx_handler = NULL;

	< system in dual state : new calls see rx_handler = NULL, but
	  current calls see rx_handler = function >

	synchronize_net();

	< system is back to single state: everyone sees rx_handler = NULL >

	rx_handler_data = NULL;

no problem ;-)

-- Steve




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steven Rostedt March 29, 2013, 7:55 p.m. UTC | #3
On Fri, 2013-03-29 at 14:36 -0400, Steven Rostedt wrote:

This one's for you Paul ;-)

That means Eric's solution fits perfectly well here.

	< system in single state : everyone sees cat = alive >

	insert_into_box(cat);

	< system in dual state : new calls see cat == dead, but
	  current calls see cat == alive >

	open_box();

	< system is back to single state: everyone sees cat = dead >

	funeral(cat); 

no problem ;-)

-- Steve



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko March 30, 2013, 9:19 a.m. UTC | #4
Fri, Mar 29, 2013 at 07:36:24PM CET, rostedt@goodmis.org wrote:
>On Fri, 2013-03-29 at 10:48 +0100, Jiri Pirko wrote:
>
>> Because, if rcu_dereference(dev->rx_handler) is null,
>> rcu_dereference(dev->rx_handler_data) is never done. Therefore I believe
>> we are hitting following scenario:
>> 
>> 
>>    CPU0				CPU1
>>    ----				----
>>   			    dev->rx_handler_data = NULL
>>  rcu_read_lock()
>>  			    dev->rx_handler = NULL
>> 
>> 
>
>That is not what is happening and that is not how RCU works. That is,
>rcu_read_lock() does not block nor does it really do much with ordering
>at all.
>
>The problem is totally contained within the rcu_read_lock() as well:
>
>
>If you have:
>
>	rcu_read_lock();
>	rx_handler = dev->rx_handler;
>	rx_handler();
>	rcu_read_unlock();
>
>where rx_handler references rx->rx_handler_data you need much more than
>making sure that rx->handler is set to null before rx_handler_data.
>
>The way RCU works is it lets things exist in a "dual state". Kind of
>like a Schödinger's cat. The solution Eric posted is a classic RCU
>example of how this works.
>
>When you set dev->rx_handler to NULL, there's two states that currently
>exist in the system. Those that still see dev->rx_handler set to
>something and those that see it set to NULL. You could put in memory
>barriers to your hearts content, but you will still have a system that
>sees things in a dual state. If you set dev->rx_handler_data to NULL,
>you risk those that see rx_handler as a function can still reference the
>rx_handler_data when it is NULL.
>
>Think of it this way:
>
>	dev->rx_handler() {
>
>Once the function has been called, even if you set rx_handler() to NULL
>at this point, it makes no difference, even with memory barriers. This
>CPU is about to execute the previous value of rx_handler and there's
>nothing you can do to stop it. Setting rx_handler_data to NULL now can
>cause that CPU to reference the NULL pointer. There isn't a ordering
>problem where rx_handler_data got set to NULL first.
>
>But the beauty about RCU is the synchronize_*() functions, because that
>puts the system back into a single state. After the synchronization is
>complete, the entire system sees rx_handler() as NULL. There is no worry
>about setting rx_handler_data to NULL now because nothing will be
>referencing the previous value of rx_handler because that value no
>longer exists in the system.
>
>That means Eric's solution fits perfectly well here.
>
>	< system in single state : everyone sees rx_handler = function() >
>
>	rx_handler = NULL;
>
>	< system in dual state : new calls see rx_handler = NULL, but
>	  current calls see rx_handler = function >
>
>	synchronize_net();
>
>	< system is back to single state: everyone sees rx_handler = NULL >
>
>	rx_handler_data = NULL;
>
>no problem ;-)
>
>-- Steve


I think I understand now. I was under false impression that when rcu_read_lock()
is held, rcu_dereference(pointer) value is predetermined (for that
single run I mean).

Thank you very much for explanation!

Jiri

>
>
>
>
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index 0caa38e..c16b829 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3332,8 +3332,8 @@  void netdev_rx_handler_unregister(struct net_device *dev)
 {
 
 	ASSERT_RTNL();
-	RCU_INIT_POINTER(dev->rx_handler, NULL);
-	RCU_INIT_POINTER(dev->rx_handler_data, NULL);
+	rcu_assign_pointer(dev->rx_handler, NULL);
+	rcu_assign_pointer(dev->rx_handler_data, NULL);
 }
 EXPORT_SYMBOL_GPL(netdev_rx_handler_unregister);