diff mbox series

[net-next,2/3] net: introduce per-netns netdevice notifiers

Message ID 20190930081511.26915-3-jiri@resnulli.us
State Accepted
Delegated to: David Miller
Headers show
Series net: introduce per-netns netdevice notifiers and use them in mlxsw | expand

Commit Message

Jiri Pirko Sept. 30, 2019, 8:15 a.m. UTC
From: Jiri Pirko <jiri@mellanox.com>

Often the code for example in drivers is interested in getting notifier
call only from certain network namespace. In addition to the existing
global netdevice notifier chain introduce per-netns chains and allow
users to register to that. Eventually this would eliminate unnecessary
overhead in case there are many netdevices in many network namespaces.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/linux/netdevice.h   |  3 ++
 include/net/net_namespace.h |  3 ++
 net/core/dev.c              | 87 +++++++++++++++++++++++++++++++++++++
 3 files changed, 93 insertions(+)

Comments

Andrew Lunn Sept. 30, 2019, 1:38 p.m. UTC | #1
>  static int call_netdevice_notifiers_info(unsigned long val,
>  					 struct netdev_notifier_info *info)
>  {
> +	struct net *net = dev_net(info->dev);
> +	int ret;
> +
>  	ASSERT_RTNL();
> +
> +	/* Run per-netns notifier block chain first, then run the global one.
> +	 * Hopefully, one day, the global one is going to be removed after
> +	 * all notifier block registrators get converted to be per-netns.
> +	 */

Hi Jiri

Is that really going to happen? register_netdevice_notifier() is used
in 130 files. Do you plan to spend the time to make it happen?

> +	ret = raw_notifier_call_chain(&net->netdev_chain, val, info);
> +	if (ret & NOTIFY_STOP_MASK)
> +		return ret;
>  	return raw_notifier_call_chain(&netdev_chain, val, info);
>  }

Humm. I wonder about NOTIFY_STOP_MASK here. These are two separate
chains. Should one chain be able to stop the other chain? Are there
other examples where NOTIFY_STOP_MASK crosses a chain boundary?

      Andrew
Jiri Pirko Sept. 30, 2019, 2:23 p.m. UTC | #2
Mon, Sep 30, 2019 at 03:38:24PM CEST, andrew@lunn.ch wrote:
>>  static int call_netdevice_notifiers_info(unsigned long val,
>>  					 struct netdev_notifier_info *info)
>>  {
>> +	struct net *net = dev_net(info->dev);
>> +	int ret;
>> +
>>  	ASSERT_RTNL();
>> +
>> +	/* Run per-netns notifier block chain first, then run the global one.
>> +	 * Hopefully, one day, the global one is going to be removed after
>> +	 * all notifier block registrators get converted to be per-netns.
>> +	 */
>
>Hi Jiri
>
>Is that really going to happen? register_netdevice_notifier() is used
>in 130 files. Do you plan to spend the time to make it happen?

That's why I prepended the sentency with "Hopefully, one day"...


>
>> +	ret = raw_notifier_call_chain(&net->netdev_chain, val, info);
>> +	if (ret & NOTIFY_STOP_MASK)
>> +		return ret;
>>  	return raw_notifier_call_chain(&netdev_chain, val, info);
>>  }
>
>Humm. I wonder about NOTIFY_STOP_MASK here. These are two separate
>chains. Should one chain be able to stop the other chain? Are there

Well if the failing item would be in the second chain, at the beginning
of it, it would be stopped too. Does not matter where the stop happens,
the point is that the whole processing stops. That is why I added the
check here.


>other examples where NOTIFY_STOP_MASK crosses a chain boundary?

Not aware of it, no. Could you please describe what is wrong?


>
>      Andrew
Andrew Lunn Sept. 30, 2019, 3:33 p.m. UTC | #3
On Mon, Sep 30, 2019 at 04:23:49PM +0200, Jiri Pirko wrote:
> Mon, Sep 30, 2019 at 03:38:24PM CEST, andrew@lunn.ch wrote:
> >>  static int call_netdevice_notifiers_info(unsigned long val,
> >>  					 struct netdev_notifier_info *info)
> >>  {
> >> +	struct net *net = dev_net(info->dev);
> >> +	int ret;
> >> +
> >>  	ASSERT_RTNL();
> >> +
> >> +	/* Run per-netns notifier block chain first, then run the global one.
> >> +	 * Hopefully, one day, the global one is going to be removed after
> >> +	 * all notifier block registrators get converted to be per-netns.
> >> +	 */
> >
> >Hi Jiri
> >
> >Is that really going to happen? register_netdevice_notifier() is used
> >in 130 files. Do you plan to spend the time to make it happen?
> 
> That's why I prepended the sentency with "Hopefully, one day"...
> 
> 
> >
> >> +	ret = raw_notifier_call_chain(&net->netdev_chain, val, info);
> >> +	if (ret & NOTIFY_STOP_MASK)
> >> +		return ret;
> >>  	return raw_notifier_call_chain(&netdev_chain, val, info);
> >>  }
> >
> >Humm. I wonder about NOTIFY_STOP_MASK here. These are two separate
> >chains. Should one chain be able to stop the other chain? Are there
> 
> Well if the failing item would be in the second chain, at the beginning
> of it, it would be stopped too. Does not matter where the stop happens,
> the point is that the whole processing stops. That is why I added the
> check here.
> 
> 
> >other examples where NOTIFY_STOP_MASK crosses a chain boundary?
> 
> Not aware of it, no. Could you please describe what is wrong?

You are expanding the meaning of NOTIFY_STOP_MASK. It now can stop
some other chain. If this was one chain with a filter, i would not be
asking. But this is two different chains, and one chain can stop
another? At minimum, i think this needs to be reviewed by the core
kernel people.

But i'm also wondering if you are solving the problem at the wrong
level. Are there other notifier chains which would benefit from
respecting name space boundaries? Would a better solution be to extend
struct notifier_block with some sort of filter?

Do you have some performance numbers? Where are you getting your
performance gains from? By the fact you are doing NOTIFY_STOP_MASK
earlier, so preventing a long chain being walked? I notice
notifer_block has a priority field. Did you try using that to put your
notified earlier on the chain?

	 Andrew
Jiri Pirko Sept. 30, 2019, 6:01 p.m. UTC | #4
Mon, Sep 30, 2019 at 05:33:43PM CEST, andrew@lunn.ch wrote:
>On Mon, Sep 30, 2019 at 04:23:49PM +0200, Jiri Pirko wrote:
>> Mon, Sep 30, 2019 at 03:38:24PM CEST, andrew@lunn.ch wrote:
>> >>  static int call_netdevice_notifiers_info(unsigned long val,
>> >>  					 struct netdev_notifier_info *info)
>> >>  {
>> >> +	struct net *net = dev_net(info->dev);
>> >> +	int ret;
>> >> +
>> >>  	ASSERT_RTNL();
>> >> +
>> >> +	/* Run per-netns notifier block chain first, then run the global one.
>> >> +	 * Hopefully, one day, the global one is going to be removed after
>> >> +	 * all notifier block registrators get converted to be per-netns.
>> >> +	 */
>> >
>> >Hi Jiri
>> >
>> >Is that really going to happen? register_netdevice_notifier() is used
>> >in 130 files. Do you plan to spend the time to make it happen?
>> 
>> That's why I prepended the sentency with "Hopefully, one day"...
>> 
>> 
>> >
>> >> +	ret = raw_notifier_call_chain(&net->netdev_chain, val, info);
>> >> +	if (ret & NOTIFY_STOP_MASK)
>> >> +		return ret;
>> >>  	return raw_notifier_call_chain(&netdev_chain, val, info);
>> >>  }
>> >
>> >Humm. I wonder about NOTIFY_STOP_MASK here. These are two separate
>> >chains. Should one chain be able to stop the other chain? Are there
>> 
>> Well if the failing item would be in the second chain, at the beginning
>> of it, it would be stopped too. Does not matter where the stop happens,
>> the point is that the whole processing stops. That is why I added the
>> check here.
>> 
>> 
>> >other examples where NOTIFY_STOP_MASK crosses a chain boundary?
>> 
>> Not aware of it, no. Could you please describe what is wrong?
>
>You are expanding the meaning of NOTIFY_STOP_MASK. It now can stop
>some other chain. If this was one chain with a filter, i would not be

Well, it was originally a single chain, so the semantics stays intact.
Again, it is not some other independent chain. It's just netns one and
general one, both serve the same purpose.


>asking. But this is two different chains, and one chain can stop
>another? At minimum, i think this needs to be reviewed by the core
>kernel people.
>
>But i'm also wondering if you are solving the problem at the wrong
>level. Are there other notifier chains which would benefit from
>respecting name space boundaries? Would a better solution be to extend
>struct notifier_block with some sort of filter?

I mentioned my primary motivation in the cover letter. What I want to
avoid is need of taking &pernet_ops_rwsem during registration of tne
notifier and avoid deadlock in my usecase.

Plus it seems very clear that if a notifier knows what netns is he
interested in, he just registers in that particular netns chain.
Having one fat generic chain with filters is basically what we have
right now.


>
>Do you have some performance numbers? Where are you getting your
>performance gains from? By the fact you are doing NOTIFY_STOP_MASK
>earlier, so preventing a long chain being walked? I notice
>notifer_block has a priority field. Did you try using that to put your
>notified earlier on the chain?

It is not about stopping the chain earlier, not at all. It is the fact
that with many netdevices in many network namespaces you gat a lot of
wasted calls to notifiers registators that does not care.



>
>	 Andrew
David Miller Oct. 2, 2019, 3:47 p.m. UTC | #5
From: Jiri Pirko <jiri@resnulli.us>
Date: Mon, 30 Sep 2019 10:15:10 +0200

> From: Jiri Pirko <jiri@mellanox.com>
> 
> Often the code for example in drivers is interested in getting notifier
> call only from certain network namespace. In addition to the existing
> global netdevice notifier chain introduce per-netns chains and allow
> users to register to that. Eventually this would eliminate unnecessary
> overhead in case there are many netdevices in many network namespaces.
> 
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>

Ok, so there was a discussion about stop semantics.

Honestly, I think that's fine.

Stop means the operation cannot be performed and whoever is firing off
the notifier will have to fail and undo the config change being
attempted.

In that context, it doesn't matter who or where in the chain we
trigger the stop.

Given all of that I am pretty sure this change is fine and I will
add it to net-next.  We can fix any actual semantic problems this
might introduce as a follow-on.
diff mbox series

Patch

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 4f390eec106b..184f54f1b9e1 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2494,6 +2494,9 @@  const char *netdev_cmd_to_name(enum netdev_cmd cmd);
 
 int register_netdevice_notifier(struct notifier_block *nb);
 int unregister_netdevice_notifier(struct notifier_block *nb);
+int register_netdevice_notifier_net(struct net *net, struct notifier_block *nb);
+int unregister_netdevice_notifier_net(struct net *net,
+				      struct notifier_block *nb);
 
 struct netdev_notifier_info {
 	struct net_device	*dev;
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index c5a98e03591d..5ac2bb16d4b3 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -36,6 +36,7 @@ 
 #include <linux/ns_common.h>
 #include <linux/idr.h>
 #include <linux/skbuff.h>
+#include <linux/notifier.h>
 
 struct user_namespace;
 struct proc_dir_entry;
@@ -96,6 +97,8 @@  struct net {
 	struct list_head 	dev_base_head;
 	struct hlist_head 	*dev_name_head;
 	struct hlist_head	*dev_index_head;
+	struct raw_notifier_head	netdev_chain;
+
 	unsigned int		dev_base_seq;	/* protected by rtnl_mutex */
 	int			ifindex;
 	unsigned int		dev_unreg_count;
diff --git a/net/core/dev.c b/net/core/dev.c
index 6a87d0e71201..3302cefd3041 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1766,6 +1766,80 @@  int unregister_netdevice_notifier(struct notifier_block *nb)
 }
 EXPORT_SYMBOL(unregister_netdevice_notifier);
 
+/**
+ * register_netdevice_notifier_net - register a per-netns network notifier block
+ * @net: network namespace
+ * @nb: notifier
+ *
+ * Register a notifier to be called when network device events occur.
+ * The notifier passed is linked into the kernel structures and must
+ * not be reused until it has been unregistered. A negative errno code
+ * is returned on a failure.
+ *
+ * When registered all registration and up events are replayed
+ * to the new notifier to allow device to have a race free
+ * view of the network device list.
+ */
+
+int register_netdevice_notifier_net(struct net *net, struct notifier_block *nb)
+{
+	int err;
+
+	rtnl_lock();
+	err = raw_notifier_chain_register(&net->netdev_chain, nb);
+	if (err)
+		goto unlock;
+	if (dev_boot_phase)
+		goto unlock;
+
+	err = call_netdevice_register_net_notifiers(nb, net);
+	if (err)
+		goto chain_unregister;
+
+unlock:
+	rtnl_unlock();
+	return err;
+
+chain_unregister:
+	raw_notifier_chain_unregister(&netdev_chain, nb);
+	goto unlock;
+}
+EXPORT_SYMBOL(register_netdevice_notifier_net);
+
+/**
+ * unregister_netdevice_notifier_net - unregister a per-netns
+ *                                     network notifier block
+ * @net: network namespace
+ * @nb: notifier
+ *
+ * Unregister a notifier previously registered by
+ * register_netdevice_notifier(). The notifier is unlinked into the
+ * kernel structures and may then be reused. A negative errno code
+ * is returned on a failure.
+ *
+ * After unregistering unregister and down device events are synthesized
+ * for all devices on the device list to the removed notifier to remove
+ * the need for special case cleanup code.
+ */
+
+int unregister_netdevice_notifier_net(struct net *net,
+				      struct notifier_block *nb)
+{
+	int err;
+
+	rtnl_lock();
+	err = raw_notifier_chain_unregister(&net->netdev_chain, nb);
+	if (err)
+		goto unlock;
+
+	call_netdevice_unregister_net_notifiers(nb, net);
+
+unlock:
+	rtnl_unlock();
+	return err;
+}
+EXPORT_SYMBOL(unregister_netdevice_notifier_net);
+
 /**
  *	call_netdevice_notifiers_info - call all network notifier blocks
  *	@val: value passed unmodified to notifier function
@@ -1778,7 +1852,18 @@  EXPORT_SYMBOL(unregister_netdevice_notifier);
 static int call_netdevice_notifiers_info(unsigned long val,
 					 struct netdev_notifier_info *info)
 {
+	struct net *net = dev_net(info->dev);
+	int ret;
+
 	ASSERT_RTNL();
+
+	/* Run per-netns notifier block chain first, then run the global one.
+	 * Hopefully, one day, the global one is going to be removed after
+	 * all notifier block registrators get converted to be per-netns.
+	 */
+	ret = raw_notifier_call_chain(&net->netdev_chain, val, info);
+	if (ret & NOTIFY_STOP_MASK)
+		return ret;
 	return raw_notifier_call_chain(&netdev_chain, val, info);
 }
 
@@ -9668,6 +9753,8 @@  static int __net_init netdev_init(struct net *net)
 	if (net->dev_index_head == NULL)
 		goto err_idx;
 
+	RAW_INIT_NOTIFIER_HEAD(&net->netdev_chain);
+
 	return 0;
 
 err_idx: