diff mbox

[net-next-2.6] net: add dev_close_many

Message ID 1292249903-3865-1-git-send-email-opurdila@ixiacom.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Octavian Purdila Dec. 13, 2010, 2:18 p.m. UTC
Add dev_close_many and dev_deactivate_many to factorize another
expensive sync-rcu operation in the netdevice unregister path.

$ modprobe dummy numdummies=10000
$ ip link set dev dummy* up
$ time rmmod dummy

Without the patch           With the patch

real    0m 24.63s           real    0m 5.15s
user    0m 0.00s            user    0m 0.00s
sys     0m 6.05s            sys     0m 5.14s

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
---
 include/net/sch_generic.h |    1 +
 net/core/dev.c            |  121 ++++++++++++++++++++++++++++----------------
 net/sched/sch_generic.c   |   29 ++++++++---
 3 files changed, 100 insertions(+), 51 deletions(-)

Comments

Eric Dumazet Dec. 13, 2010, 4:52 p.m. UTC | #1
Le lundi 13 décembre 2010 à 16:18 +0200, Octavian Purdila a écrit :
> Add dev_close_many and dev_deactivate_many to factorize another
> expensive sync-rcu operation in the netdevice unregister path.
> 
> $ modprobe dummy numdummies=10000
> $ ip link set dev dummy* up
> $ time rmmod dummy
> 
> Without the patch           With the patch
> 
> real    0m 24.63s           real    0m 5.15s
> user    0m 0.00s            user    0m 0.00s
> sys     0m 6.05s            sys     0m 5.14s
> 
> Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
> ---

Hmm, I think this solves the "rmmod dummy" case, but not the "dismantle
devices one by one", which is the general one (on heavy duty tunnels/ppp
servers)

I think we could use a kernel thread (a workqueue presumably), handling
3 lists of devices to be dismantled, respecting one rcu grace period (or
rcu_barrier()) before transfert of one item from one list to following
one.

This way, each device removal could post a device to this kernel thread
and return to user immediately. Time of RTNL hold would be reduced
(calls to synchronize_rcu() would be done with RTNL not held)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Octavian Purdila Dec. 13, 2010, 5:23 p.m. UTC | #2
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Monday 13 December 2010, 18:52:25

> Hmm, I think this solves the "rmmod dummy" case, but not the "dismantle
> devices one by one", which is the general one (on heavy duty tunnels/ppp
> servers)
> 
> I think we could use a kernel thread (a workqueue presumably), handling
> 3 lists of devices to be dismantled, respecting one rcu grace period (or
> rcu_barrier()) before transfert of one item from one list to following
> one.
> 
> This way, each device removal could post a device to this kernel thread
> and return to user immediately. Time of RTNL hold would be reduced
> (calls to synchronize_rcu() would be done with RTNL not held)

We also run into the case where we have to dismantle the interfaces one by one 
but we fix it by gathering the requests in userspace and then doing a 
unregister_netdevice_many operation.

I like the kernel thread / workqueue idea. But we would still need 
netdevice_unregister_many and dev_close_many right? - we put the device in the 
unregister list in unregister_netdevice and call unregister_netdevice_many in 
the kernel thread.


 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
stephen hemminger Dec. 13, 2010, 5:32 p.m. UTC | #3
On Mon, 13 Dec 2010 19:23:26 +0200
Octavian Purdila <opurdila@ixiacom.com> wrote:

> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Monday 13 December 2010, 18:52:25
> 
> > Hmm, I think this solves the "rmmod dummy" case, but not the "dismantle
> > devices one by one", which is the general one (on heavy duty tunnels/ppp
> > servers)
> > 
> > I think we could use a kernel thread (a workqueue presumably), handling
> > 3 lists of devices to be dismantled, respecting one rcu grace period (or
> > rcu_barrier()) before transfert of one item from one list to following
> > one.
> > 
> > This way, each device removal could post a device to this kernel thread
> > and return to user immediately. Time of RTNL hold would be reduced
> > (calls to synchronize_rcu() would be done with RTNL not held)
> 
> We also run into the case where we have to dismantle the interfaces one by one 
> but we fix it by gathering the requests in userspace and then doing a 
> unregister_netdevice_many operation.
> 
> I like the kernel thread / workqueue idea. But we would still need 
> netdevice_unregister_many and dev_close_many right? - we put the device in the 
> unregister list in unregister_netdevice and call unregister_netdevice_many in 
> the kernel thread.

With a message based interface, there shouldn't be a need for this.
Just have one thread sending requests in user space, and one receiving
the ACK's.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Octavian Purdila Dec. 13, 2010, 5:52 p.m. UTC | #4
From: Stephen Hemminger <shemminger@vyatta.com>
Date: Monday 13 December 2010, 19:32:21

> With a message based interface, there shouldn't be a need for this.
> Just have one thread sending requests in user space, and one receiving
> the ACK's.

Sorry, you lost me here :) There is no need for the kernel thread / workqueue 
or not even for dev_close_many?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 13, 2010, 5:54 p.m. UTC | #5
From: Octavian Purdila <opurdila@ixiacom.com>
Date: Mon, 13 Dec 2010 16:18:23 +0200

> -static int __dev_close(struct net_device *dev)
> +static int __dev_close_many(struct list_head *head)
>  {
> -	const struct net_device_ops *ops = dev->netdev_ops;
> +	struct net_device *dev;
>  
> -	ASSERT_RTNL();
> -	might_sleep();
> +	list_for_each_entry(dev, head, unreg_list) {
> +		ASSERT_RTNL();
> +		might_sleep();

It doesn't make any sense to put these insertions into this loop since
they are testing top-level invariants that must be provided by the
caller.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
stephen hemminger Dec. 13, 2010, 6:04 p.m. UTC | #6
On Mon, 13 Dec 2010 19:52:39 +0200
Octavian Purdila <opurdila@ixiacom.com> wrote:

> From: Stephen Hemminger <shemminger@vyatta.com>
> Date: Monday 13 December 2010, 19:32:21
> 
> > With a message based interface, there shouldn't be a need for this.
> > Just have one thread sending requests in user space, and one receiving
> > the ACK's.
> 
> Sorry, you lost me here :) There is no need for the kernel thread / workqueue 
> or not even for dev_close_many?

I assume the need for dev_close_many is coming from a user space application?

I expect that for this kind of special need,  you would be better off not
using the normal iproute utilities and instead have a custom device manager
that is doing netlink directly.

Rather than doing synchronous send request and wait for ack. The utility
could use a sender and collector thread.
Octavian Purdila Dec. 13, 2010, 8:54 p.m. UTC | #7
From: Stephen Hemminger <shemminger@vyatta.com>
Date: Monday 13 December 2010, 20:04:47

> I assume the need for dev_close_many is coming from a user space
> application?
> 
> I expect that for this kind of special need,  you would be better off not
> using the normal iproute utilities and instead have a custom device manager
> that is doing netlink directly.
> 
> Rather than doing synchronous send request and wait for ack. The utility
> could use a sender and collector thread.

Actually we need dev_close_many in order to speed up netdev_unregister_many 
since in the netdev_unregister_many path there is still one more sync-rcu 
operation which is not factorized.

I will prepare v2 to address David's comment and I will also be more explicit 
in the commit message.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
stephen hemminger Dec. 13, 2010, 11:34 p.m. UTC | #8
On Mon, 13 Dec 2010 22:54:24 +0200
Octavian Purdila <opurdila@ixiacom.com> wrote:

> From: Stephen Hemminger <shemminger@vyatta.com>
> Date: Monday 13 December 2010, 20:04:47
> 
> > I assume the need for dev_close_many is coming from a user space
> > application?
> > 
> > I expect that for this kind of special need,  you would be better off not
> > using the normal iproute utilities and instead have a custom device manager
> > that is doing netlink directly.
> > 
> > Rather than doing synchronous send request and wait for ack. The utility
> > could use a sender and collector thread.
> 
> Actually we need dev_close_many in order to speed up netdev_unregister_many 
> since in the netdev_unregister_many path there is still one more sync-rcu 
> operation which is not factorized.
> 
> I will prepare v2 to address David's comment and I will also be more explicit 
> in the commit message.

That makes sense.
diff mbox

Patch

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index ea1f8a8..786cc39 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -321,6 +321,7 @@  extern void dev_init_scheduler(struct net_device *dev);
 extern void dev_shutdown(struct net_device *dev);
 extern void dev_activate(struct net_device *dev);
 extern void dev_deactivate(struct net_device *dev);
+extern void dev_deactivate_many(struct list_head *head);
 extern struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue,
 				     struct Qdisc *qdisc);
 extern void qdisc_reset(struct Qdisc *qdisc);
diff --git a/net/core/dev.c b/net/core/dev.c
index d28b3a0..7cab19f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1222,51 +1222,88 @@  int dev_open(struct net_device *dev)
 }
 EXPORT_SYMBOL(dev_open);
 
-static int __dev_close(struct net_device *dev)
+static int __dev_close_many(struct list_head *head)
 {
-	const struct net_device_ops *ops = dev->netdev_ops;
+	struct net_device *dev;
 
-	ASSERT_RTNL();
-	might_sleep();
+	list_for_each_entry(dev, head, unreg_list) {
+		ASSERT_RTNL();
+		might_sleep();
 
-	/*
-	 *	Tell people we are going down, so that they can
-	 *	prepare to death, when device is still operating.
-	 */
-	call_netdevice_notifiers(NETDEV_GOING_DOWN, dev);
+		/*
+		 *	Tell people we are going down, so that they can
+		 *	prepare to death, when device is still operating.
+		 */
+		call_netdevice_notifiers(NETDEV_GOING_DOWN, dev);
 
-	clear_bit(__LINK_STATE_START, &dev->state);
+		clear_bit(__LINK_STATE_START, &dev->state);
 
-	/* Synchronize to scheduled poll. We cannot touch poll list,
-	 * it can be even on different cpu. So just clear netif_running().
-	 *
-	 * dev->stop() will invoke napi_disable() on all of it's
-	 * napi_struct instances on this device.
-	 */
-	smp_mb__after_clear_bit(); /* Commit netif_running(). */
+		/* Synchronize to scheduled poll. We cannot touch poll list, it
+		 * can be even on different cpu. So just clear netif_running().
+		 *
+		 * dev->stop() will invoke napi_disable() on all of it's
+		 * napi_struct instances on this device.
+		 */
+		smp_mb__after_clear_bit(); /* Commit netif_running(). */
+	}
 
-	dev_deactivate(dev);
+	dev_deactivate_many(head);
 
-	/*
-	 *	Call the device specific close. This cannot fail.
-	 *	Only if device is UP
-	 *
-	 *	We allow it to be called even after a DETACH hot-plug
-	 *	event.
-	 */
-	if (ops->ndo_stop)
-		ops->ndo_stop(dev);
+	list_for_each_entry(dev, head, unreg_list) {
+		const struct net_device_ops *ops = dev->netdev_ops;
 
-	/*
-	 *	Device is now down.
-	 */
+		/*
+		 *	Call the device specific close. This cannot fail.
+		 *	Only if device is UP
+		 *
+		 *	We allow it to be called even after a DETACH hot-plug
+		 *	event.
+		 */
+		if (ops->ndo_stop)
+			ops->ndo_stop(dev);
+
+		/*
+		 *	Device is now down.
+		 */
+
+		dev->flags &= ~IFF_UP;
+
+		/*
+		 *	Shutdown NET_DMA
+		 */
+		net_dmaengine_put();
+	}
 
-	dev->flags &= ~IFF_UP;
+	return 0;
+}
+
+static int __dev_close(struct net_device *dev)
+{
+	LIST_HEAD(single);
+
+	list_add(&dev->unreg_list, &single);
+	return __dev_close_many(&single);
+}
+
+int dev_close_many(struct list_head *head)
+{
+	struct net_device *dev, *tmp;
+
+	list_for_each_entry_safe(dev, tmp, head, unreg_list)
+		if (!(dev->flags & IFF_UP)) {
+			list_del(&dev->unreg_list);
+			continue;
+		}
+
+	__dev_close_many(head);
 
 	/*
-	 *	Shutdown NET_DMA
+	 * Tell people we are down
 	 */
-	net_dmaengine_put();
+	list_for_each_entry(dev, head, unreg_list) {
+		rtmsg_ifinfo(RTM_NEWLINK, dev, IFF_UP|IFF_RUNNING);
+		call_netdevice_notifiers(NETDEV_DOWN, dev);
+	}
 
 	return 0;
 }
@@ -1282,16 +1319,10 @@  static int __dev_close(struct net_device *dev)
  */
 int dev_close(struct net_device *dev)
 {
-	if (!(dev->flags & IFF_UP))
-		return 0;
-
-	__dev_close(dev);
+	LIST_HEAD(single);
 
-	/*
-	 * Tell people we are down
-	 */
-	rtmsg_ifinfo(RTM_NEWLINK, dev, IFF_UP|IFF_RUNNING);
-	call_netdevice_notifiers(NETDEV_DOWN, dev);
+	list_add(&dev->unreg_list, &single);
+	dev_close_many(&single);
 
 	return 0;
 }
@@ -4958,10 +4989,12 @@  static void rollback_registered_many(struct list_head *head)
 		}
 
 		BUG_ON(dev->reg_state != NETREG_REGISTERED);
+	}
 
-		/* If device is running, close it first. */
-		dev_close(dev);
+	/* If device is running, close it first. */
+	dev_close_many(head);
 
+	list_for_each_entry(dev, head, unreg_list) {
 		/* And unlink it from device chain. */
 		unlist_netdevice(dev);
 
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 0918834..34dc598 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -810,20 +810,35 @@  static bool some_qdisc_is_busy(struct net_device *dev)
 	return false;
 }
 
-void dev_deactivate(struct net_device *dev)
+void dev_deactivate_many(struct list_head *head)
 {
-	netdev_for_each_tx_queue(dev, dev_deactivate_queue, &noop_qdisc);
-	if (dev_ingress_queue(dev))
-		dev_deactivate_queue(dev, dev_ingress_queue(dev), &noop_qdisc);
+	struct net_device *dev;
 
-	dev_watchdog_down(dev);
+	list_for_each_entry(dev, head, unreg_list) {
+		netdev_for_each_tx_queue(dev, dev_deactivate_queue,
+					 &noop_qdisc);
+		if (dev_ingress_queue(dev))
+			dev_deactivate_queue(dev, dev_ingress_queue(dev),
+					     &noop_qdisc);
+
+		dev_watchdog_down(dev);
+	}
 
 	/* Wait for outstanding qdisc-less dev_queue_xmit calls. */
 	synchronize_rcu();
 
 	/* Wait for outstanding qdisc_run calls. */
-	while (some_qdisc_is_busy(dev))
-		yield();
+	list_for_each_entry(dev, head, unreg_list)
+		while (some_qdisc_is_busy(dev))
+			yield();
+}
+
+void dev_deactivate(struct net_device *dev)
+{
+	LIST_HEAD(single);
+
+	list_add(&dev->unreg_list, &single);
+	dev_deactivate_many(&single);
 }
 
 static void dev_init_scheduler_queue(struct net_device *dev,