diff mbox

[net-next,v2,01/15] net: introduce upper device lists

Message ID 1344956748-2099-2-git-send-email-jiri@resnulli.us
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Jiri Pirko Aug. 14, 2012, 3:05 p.m. UTC
This lists are supposed to serve for storing pointers to all upper devices.
Eventually it will replace dev->master pointer which is used for
bonding, bridge, team but it cannot be used for vlan, macvlan where
there might be multiple upper present. In case the upper link is
replacement for dev->master, it is marked with "master" flag.

New upper device list resolves this limitation. Also, the information
stored in lists is used for preventing looping setups like
"bond->somethingelse->samebond"

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 include/linux/netdevice.h |   14 +++
 net/core/dev.c            |  236 ++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 246 insertions(+), 4 deletions(-)

Comments

Ben Hutchings Aug. 14, 2012, 10:33 p.m. UTC | #1
On Tue, 2012-08-14 at 17:05 +0200, Jiri Pirko wrote:
> This lists are supposed to serve for storing pointers to all upper devices.
> Eventually it will replace dev->master pointer which is used for
> bonding, bridge, team but it cannot be used for vlan, macvlan where
> there might be multiple upper present. In case the upper link is
> replacement for dev->master, it is marked with "master" flag.

Something I found interesting is that the dev->master pointer and now
netdev_master_upper_dev_get{,_rcu}() are hardly used by the stackled
drivers that set the master.  They also have to set an rx_handler on the
lower device (which is itself mutually exclusive) which gets its own
context pointer (rx_handler_data).

Instead, the master pointer is mostly used by device drivers to find out
about a bridge or bonding device above *their* devices.  And that seems
to work only for those specific device drivers, not e.g. openvswitch or
team.  I wonder if we could find a better way to encapsulate the things
they want do do, in a later step (not holding up this change!).

[...]
> +static int __netdev_upper_dev_link(struct net_device *dev,
> +                                  struct net_device *upper_dev, bool master)
> +{
> +       struct netdev_upper *upper;
> +
> +       ASSERT_RTNL();
> +
> +       if (dev == upper_dev)
> +               return -EBUSY;
> +       /*
> +        * To prevent loops, check if dev is not upper device to upper_dev.
> +        */
> +       if (__netdev_has_upper_dev(upper_dev, dev, true))
> +               return -EBUSY;
[...]

I think we will also need to limit the depth of the device stack so we
don't run out of stack space here.  __netif_receive() implements a kind
of tail recursion whenever a packet is passed up, but
__netdev_has_upper_dev() can't avoid doing real recursion (without the
addition of a flag to net_device so it can mark its progress).

Ben.
stephen hemminger Aug. 14, 2012, 11 p.m. UTC | #2
On Tue, 14 Aug 2012 23:33:44 +0100
Ben Hutchings <bhutchings@solarflare.com> wrote:

> On Tue, 2012-08-14 at 17:05 +0200, Jiri Pirko wrote:
> > This lists are supposed to serve for storing pointers to all upper devices.
> > Eventually it will replace dev->master pointer which is used for
> > bonding, bridge, team but it cannot be used for vlan, macvlan where
> > there might be multiple upper present. In case the upper link is
> > replacement for dev->master, it is marked with "master" flag.
> 
> Something I found interesting is that the dev->master pointer and now
> netdev_master_upper_dev_get{,_rcu}() are hardly used by the stackled
> drivers that set the master.  They also have to set an rx_handler on the
> lower device (which is itself mutually exclusive) which gets its own
> context pointer (rx_handler_data).
> 
> Instead, the master pointer is mostly used by device drivers to find out
> about a bridge or bonding device above *their* devices.  And that seems
> to work only for those specific device drivers, not e.g. openvswitch or
> team.  I wonder if we could find a better way to encapsulate the things
> they want do do, in a later step (not holding up this change!).

The concept is master is very useful to user level config things like
Vyatta for seeing parent/child relationship. Since is in ABI now, it
must stay.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko Aug. 15, 2012, 7:46 a.m. UTC | #3
Wed, Aug 15, 2012 at 12:33:44AM CEST, bhutchings@solarflare.com wrote:
>On Tue, 2012-08-14 at 17:05 +0200, Jiri Pirko wrote:
>> This lists are supposed to serve for storing pointers to all upper devices.
>> Eventually it will replace dev->master pointer which is used for
>> bonding, bridge, team but it cannot be used for vlan, macvlan where
>> there might be multiple upper present. In case the upper link is
>> replacement for dev->master, it is marked with "master" flag.
>
>Something I found interesting is that the dev->master pointer and now
>netdev_master_upper_dev_get{,_rcu}() are hardly used by the stackled
>drivers that set the master.  They also have to set an rx_handler on the
>lower device (which is itself mutually exclusive) which gets its own
>context pointer (rx_handler_data).
>
>Instead, the master pointer is mostly used by device drivers to find out
>about a bridge or bonding device above *their* devices.  And that seems
>to work only for those specific device drivers, not e.g. openvswitch or
>team.  I wonder if we could find a better way to encapsulate the things
>they want do do, in a later step (not holding up this change!).

Yes. I was thinking about this as well. I believe that we should follow up
with this.

>
>[...]
>> +static int __netdev_upper_dev_link(struct net_device *dev,
>> +                                  struct net_device *upper_dev, bool master)
>> +{
>> +       struct netdev_upper *upper;
>> +
>> +       ASSERT_RTNL();
>> +
>> +       if (dev == upper_dev)
>> +               return -EBUSY;
>> +       /*
>> +        * To prevent loops, check if dev is not upper device to upper_dev.
>> +        */
>> +       if (__netdev_has_upper_dev(upper_dev, dev, true))
>> +               return -EBUSY;
>[...]
>
>I think we will also need to limit the depth of the device stack so we
>don't run out of stack space here.  __netif_receive() implements a kind
>of tail recursion whenever a packet is passed up, but
>__netdev_has_upper_dev() can't avoid doing real recursion (without the
>addition of a flag to net_device so it can mark its progress).
>

You are probably right. I'm not sure how to handle this correctly
though. Adding some hard limit number might not be correct.

The problem could be also resolved by adding another struct list_head
into struct upper and use this inside __netdev_has_upper_dev(). But that
does not seem right to me as well (Considering the fact that walking
through the tree could be in future done under _rcu).

>Ben.
>
>-- 
>Ben Hutchings, Staff Engineer, Solarflare
>Not speaking for my employer; that's the marketing department's job.
>They asked us to note that Solarflare product names are trademarked.
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Aug. 15, 2012, 10:12 p.m. UTC | #4
From: Ben Hutchings <bhutchings@solarflare.com>
Date: Tue, 14 Aug 2012 23:33:44 +0100

> I think we will also need to limit the depth of the device stack so we
> don't run out of stack space here.  __netif_receive() implements a kind
> of tail recursion whenever a packet is passed up, but
> __netdev_has_upper_dev() can't avoid doing real recursion (without the
> addition of a flag to net_device so it can mark its progress).

Agreed, we need some kind of limit here.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Aug. 15, 2012, 10:15 p.m. UTC | #5
From: Jiri Pirko <jiri@resnulli.us>
Date: Wed, 15 Aug 2012 09:46:12 +0200

> You are probably right. I'm not sure how to handle this correctly
> though. Adding some hard limit number might not be correct.

I would just use a hard limit of something like 8 for now, and if we
need to expand this limit we can consider how to do so with real known
usage in mind rather than pure speculation.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko Aug. 16, 2012, 5:48 a.m. UTC | #6
Thu, Aug 16, 2012 at 12:15:49AM CEST, davem@davemloft.net wrote:
>From: Jiri Pirko <jiri@resnulli.us>
>Date: Wed, 15 Aug 2012 09:46:12 +0200
>
>> You are probably right. I'm not sure how to handle this correctly
>> though. Adding some hard limit number might not be correct.
>
>I would just use a hard limit of something like 8 for now, and if we
>need to expand this limit we can consider how to do so with real known
>usage in mind rather than pure speculation.

Okay. I will repost the set soon.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index a9db4f3..95345d4 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1173,6 +1173,8 @@  struct net_device {
 					  * which this device is member of.
 					  */
 
+	struct list_head	upper_dev_list; /* List of upper devices */
+
 	/* Interface address info used in eth_type_trans() */
 	unsigned char		*dev_addr;	/* hw address, (before bcast
 						   because most packets are
@@ -2611,6 +2613,18 @@  extern int		netdev_max_backlog;
 extern int		netdev_tstamp_prequeue;
 extern int		weight_p;
 extern int		bpf_jit_enable;
+
+extern bool netdev_has_upper_dev(struct net_device *dev,
+				 struct net_device *upper_dev);
+extern bool netdev_has_any_upper_dev(struct net_device *dev);
+extern struct net_device *netdev_master_upper_dev_get(struct net_device *dev);
+extern struct net_device *netdev_master_upper_dev_get_rcu(struct net_device *dev);
+extern int netdev_upper_dev_link(struct net_device *dev,
+				 struct net_device *upper_dev);
+extern int netdev_master_upper_dev_link(struct net_device *dev,
+					struct net_device *upper_dev);
+extern void netdev_upper_dev_unlink(struct net_device *dev,
+				    struct net_device *upper_dev);
 extern int		netdev_set_master(struct net_device *dev, struct net_device *master);
 extern int netdev_set_bond_master(struct net_device *dev,
 				  struct net_device *master);
diff --git a/net/core/dev.c b/net/core/dev.c
index 1f06df8..d7fd235 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4425,6 +4425,229 @@  static int __init dev_proc_init(void)
 #endif	/* CONFIG_PROC_FS */
 
 
+struct netdev_upper {
+	struct net_device *dev;
+	bool master;
+	struct list_head list;
+	struct rcu_head rcu;
+};
+
+static bool __netdev_has_upper_dev(struct net_device *dev,
+				   struct net_device *upper_dev,
+				   bool deep)
+{
+	struct netdev_upper *upper;
+
+	list_for_each_entry(upper, &dev->upper_dev_list, list) {
+		if (upper->dev == upper_dev)
+			return true;
+		if (deep && __netdev_has_upper_dev(upper->dev, upper_dev, deep))
+			return true;
+	}
+	return false;
+}
+
+static struct netdev_upper *__netdev_find_upper(struct net_device *dev,
+						struct net_device *upper_dev)
+{
+	struct netdev_upper *upper;
+
+	list_for_each_entry(upper, &dev->upper_dev_list, list) {
+		if (upper->dev == upper_dev)
+			return upper;
+	}
+	return NULL;
+}
+
+/**
+ * netdev_has_upper_dev - Check if device is linked to an upper device
+ * @dev: device
+ * @upper_dev: upper device to check
+ *
+ * Find out if a device is linked to specified upper device and return true
+ * in case it is. The caller must hold the RTNL semaphore.
+ */
+bool netdev_has_upper_dev(struct net_device *dev,
+			  struct net_device *upper_dev)
+{
+	ASSERT_RTNL();
+
+	return __netdev_has_upper_dev(dev, upper_dev, false);
+}
+EXPORT_SYMBOL(netdev_has_upper_dev);
+
+/**
+ * netdev_has_any_upper_dev - Check if device is linked to some device
+ * @dev: device
+ *
+ * Find out if a device is linked to an upper device and return true in case
+ * it is. The caller must hold the RTNL semaphore.
+ */
+bool netdev_has_any_upper_dev(struct net_device *dev)
+{
+	ASSERT_RTNL();
+
+	return !list_empty(&dev->upper_dev_list);
+}
+EXPORT_SYMBOL(netdev_has_any_upper_dev);
+
+/**
+ * netdev_master_upper_dev_get - Get master upper device
+ * @dev: device
+ *
+ * Find a master upper device and return pointer to it or NULL in case
+ * it's not there. The caller must hold the RTNL semaphore.
+ */
+struct net_device *netdev_master_upper_dev_get(struct net_device *dev)
+{
+	struct netdev_upper *upper;
+
+	ASSERT_RTNL();
+
+	if (list_empty(&dev->upper_dev_list))
+		return NULL;
+
+	upper = list_first_entry(&dev->upper_dev_list,
+				 struct netdev_upper, list);
+	if (likely(upper->master))
+		return upper->dev;
+	return NULL;
+}
+EXPORT_SYMBOL(netdev_master_upper_dev_get);
+
+/**
+ * netdev_master_upper_dev_get_rcu - Get master upper device
+ * @dev: device
+ *
+ * Find a master upper device and return pointer to it or NULL in case
+ * it's not there. The caller must hold the RCU read lock.
+ */
+struct net_device *netdev_master_upper_dev_get_rcu(struct net_device *dev)
+{
+	struct netdev_upper *upper;
+
+	upper = list_first_or_null_rcu(&dev->upper_dev_list,
+				       struct netdev_upper, list);
+	if (upper && likely(upper->master))
+		return upper->dev;
+	return NULL;
+}
+EXPORT_SYMBOL(netdev_master_upper_dev_get_rcu);
+
+static int __netdev_upper_dev_link(struct net_device *dev,
+				   struct net_device *upper_dev, bool master)
+{
+	struct netdev_upper *upper;
+
+	ASSERT_RTNL();
+
+	if (dev == upper_dev)
+		return -EBUSY;
+	/*
+	 * To prevent loops, check if dev is not upper device to upper_dev.
+	 */
+	if (__netdev_has_upper_dev(upper_dev, dev, true))
+		return -EBUSY;
+
+	if (__netdev_find_upper(dev, upper_dev))
+		return -EEXIST;
+
+	if (master && netdev_master_upper_dev_get(dev))
+		return -EBUSY;
+
+	upper = kmalloc(sizeof(*upper), GFP_KERNEL);
+	if (!upper)
+		return -ENOMEM;
+
+	upper->dev = upper_dev;
+	upper->master = master;
+
+	/*
+	 * Ensure that master upper link is always the first item in the list.
+	 */
+	if (master)
+		list_add_rcu(&upper->list, &dev->upper_dev_list);
+	else
+		list_add_tail_rcu(&upper->list, &dev->upper_dev_list);
+	dev_hold(upper_dev);
+
+	return 0;
+}
+/**
+ * netdev_upper_dev_link - Add a link to the upper device
+ * @dev: device
+ * @upper_dev: new upper device
+ *
+ * Adds a link to device which is upper to this one. The caller must hold
+ * the RTNL semaphore. On a failure a negative errno code is returned.
+ * On success the reference counts are adjusted and the function
+ * returns zero.
+ */
+int netdev_upper_dev_link(struct net_device *dev,
+			  struct net_device *upper_dev)
+{
+	return __netdev_upper_dev_link(dev, upper_dev, false);
+}
+EXPORT_SYMBOL(netdev_upper_dev_link);
+
+/**
+ * netdev_master_upper_dev_link - Add a master link to the upper device
+ * @dev: device
+ * @upper_dev: new upper device
+ *
+ * Adds a link to device which is upper to this one. In this case, only
+ * one master upper device can be linked, although other non-master devices
+ * might be linked as well. The caller must hold the RTNL semaphore.
+ * On a failure a negative errno code is returned. On success the reference
+ * counts are adjusted and the function returns zero.
+ */
+int netdev_master_upper_dev_link(struct net_device *dev,
+				 struct net_device *upper_dev)
+{
+	return __netdev_upper_dev_link(dev, upper_dev, true);
+}
+EXPORT_SYMBOL(netdev_master_upper_dev_link);
+
+/**
+ * netdev_upper_free_rcu - Frees a upper device list item via the RCU pointer
+ * @entry: the entry's RCU field
+ *
+ * This function is designed to be used as a callback to the call_rcu()
+ * function so that the memory allocated to the netdev upper device list item
+ * can be released safely.
+ */
+static void netdev_upper_free_rcu(struct rcu_head *entry)
+{
+	struct netdev_upper *upper;
+
+	upper = container_of(entry, struct netdev_upper, rcu);
+	kfree(upper);
+}
+
+/**
+ * netdev_upper_dev_unlink - Removes a link to upper device
+ * @dev: device
+ * @upper_dev: new upper device
+ *
+ * Removes a link to device which is upper to this one. The caller must hold
+ * the RTNL semaphore.
+ */
+void netdev_upper_dev_unlink(struct net_device *dev,
+			     struct net_device *upper_dev)
+{
+	struct netdev_upper *upper;
+
+	ASSERT_RTNL();
+
+	upper = __netdev_find_upper(dev, upper_dev);
+	if (!upper)
+		return;
+	list_del_rcu(&upper->list);
+	dev_put(upper_dev);
+	call_rcu(&upper->rcu, netdev_upper_free_rcu);
+}
+EXPORT_SYMBOL(netdev_upper_dev_unlink);
+
 /**
  *	netdev_set_master	-	set up master pointer
  *	@slave: slave device
@@ -4438,19 +4661,23 @@  static int __init dev_proc_init(void)
 int netdev_set_master(struct net_device *slave, struct net_device *master)
 {
 	struct net_device *old = slave->master;
+	int err;
 
 	ASSERT_RTNL();
 
 	if (master) {
 		if (old)
 			return -EBUSY;
-		dev_hold(master);
+		err = netdev_master_upper_dev_link(slave, master);
+		if (err)
+			return err;
 	}
 
 	slave->master = master;
 
 	if (old)
-		dev_put(old);
+		netdev_upper_dev_unlink(slave, master);
+
 	return 0;
 }
 EXPORT_SYMBOL(netdev_set_master);
@@ -5297,8 +5524,8 @@  static void rollback_registered_many(struct list_head *head)
 		if (dev->netdev_ops->ndo_uninit)
 			dev->netdev_ops->ndo_uninit(dev);
 
-		/* Notifier chain MUST detach us from master device. */
-		WARN_ON(dev->master);
+		/* Notifier chain MUST detach us all upper devices. */
+		WARN_ON(netdev_has_any_upper_dev(dev));
 
 		/* Remove entries from kobject tree */
 		netdev_unregister_kobject(dev);
@@ -5999,6 +6226,7 @@  struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 	INIT_LIST_HEAD(&dev->napi_list);
 	INIT_LIST_HEAD(&dev->unreg_list);
 	INIT_LIST_HEAD(&dev->link_watch_list);
+	INIT_LIST_HEAD(&dev->upper_dev_list);
 	dev->priv_flags = IFF_XMIT_DST_RELEASE;
 	setup(dev);