Patchwork [04/10] AOE: use rcu to find network device

login
register
mail settings
Submitter stephen hemminger
Date Nov. 10, 2009, 5:54 p.m.
Message ID <20091110175647.409162953@vyatta.com>
Download mbox | patch
Permalink /patch/38077/
State Rejected
Delegated to: David Miller
Headers show

Comments

stephen hemminger - Nov. 10, 2009, 5:54 p.m.
This gets rid of another use of read_lock(&dev_base_lock) by using
RCU. Also, it only increments the reference count of the device actually
used rather than holding and releasing every device

Compile tested only.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Eric Dumazet - Nov. 10, 2009, 6:23 p.m.
Stephen Hemminger a écrit :
> This gets rid of another use of read_lock(&dev_base_lock) by using
> RCU. Also, it only increments the reference count of the device actually
> used rather than holding and releasing every device
> 
> Compile tested only.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> 
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ed Cashin - Nov. 10, 2009, 8:01 p.m.
On Tue Nov 10 13:07:37 EST 2009, shemminger@vyatta.com wrote:
> This gets rid of another use of read_lock(&dev_base_lock) by using
> RCU. Also, it only increments the reference count of the device actually
> used rather than holding and releasing every device
> 
> Compile tested only.

This function runs once a minute when the aoe driver is loaded,
if you'd like to test it a bit more.

It looks like there's no dev_put corresponding to the dev_hold
after the changes.
stephen hemminger - Nov. 10, 2009, 11:06 p.m.
On Tue, 10 Nov 2009 15:01:49 -0500
Ed Cashin <ecashin@coraid.com> wrote:

> On Tue Nov 10 13:07:37 EST 2009, shemminger@vyatta.com wrote:
> > This gets rid of another use of read_lock(&dev_base_lock) by using
> > RCU. Also, it only increments the reference count of the device actually
> > used rather than holding and releasing every device
> > 
> > Compile tested only.
> 
> This function runs once a minute when the aoe driver is loaded,
> if you'd like to test it a bit more.
> 
> It looks like there's no dev_put corresponding to the dev_hold
> after the changes.
> 

Hmm, looks like AOE actually is not ref counting the network device.
So my patch is incorrect. 

As it stands (before my patch), it is UNSAFE. It can decide to queue
packets to a device that is removed out from underneath it causing
reference to freed memory.

Moving the rcu_read_lock up to aoecmd_cfg() would solve that but the
whole driver appears to be unsafe about device refcounting and handling
device removal properly.  

It needs to:

1. Get a device ref count when it remembers a device: (ie addif)
2. Install a notifier that looks for device removal events
3. In notifier, remove interface, including flushing all pending
   skb's for that device.

This obviously is beyond the scope of the RCU stuff.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ed Cashin - Nov. 11, 2009, 2:22 p.m.
On Tue Nov 10 18:06:41 EST 2009, shemminger@vyatta.com wrote:
...
> Hmm, looks like AOE actually is not ref counting the network device.
> So my patch is incorrect. 
> 
> As it stands (before my patch), it is UNSAFE. It can decide to queue
> packets to a device that is removed out from underneath it causing
> reference to freed memory.

Yes, that's right.  I will look at the patch you sent in the follow
up.  Thanks.

Patch

--- a/drivers/block/aoe/aoecmd.c	2009-11-09 22:19:06.082480836 -0800
+++ b/drivers/block/aoe/aoecmd.c	2009-11-10 09:28:38.222438732 -0800
@@ -296,17 +296,18 @@  aoecmd_cfg_pkts(ushort aoemajor, unsigne
 	struct sk_buff *skb;
 	struct net_device *ifp;
 
-	read_lock(&dev_base_lock);
-	for_each_netdev(&init_net, ifp) {
-		dev_hold(ifp);
+	rcu_read_lock();
+	for_each_netdev_rcu(&init_net, ifp) {
 		if (!is_aoe_netif(ifp))
-			goto cont;
+			continue;
 
 		skb = new_skb(sizeof *h + sizeof *ch);
 		if (skb == NULL) {
 			printk(KERN_INFO "aoe: skb alloc failure\n");
-			goto cont;
+			continue;
 		}
+
+		dev_hold(ifp);
 		skb_put(skb, sizeof *h + sizeof *ch);
 		skb->dev = ifp;
 		__skb_queue_tail(queue, skb);
@@ -320,11 +321,8 @@  aoecmd_cfg_pkts(ushort aoemajor, unsigne
 		h->major = cpu_to_be16(aoemajor);
 		h->minor = aoeminor;
 		h->cmd = AOECMD_CFG;
-
-cont:
-		dev_put(ifp);
 	}
-	read_unlock(&dev_base_lock);
+	rcu_read_unlock();
 }
 
 static void