Message ID | 20190930081511.26915-3-jiri@resnulli.us |
---|---|
State | Accepted |
Delegated to: | David Miller |
Headers | show |
Series | net: introduce per-netns netdevice notifiers and use them in mlxsw | expand |
> static int call_netdevice_notifiers_info(unsigned long val, > struct netdev_notifier_info *info) > { > + struct net *net = dev_net(info->dev); > + int ret; > + > ASSERT_RTNL(); > + > + /* Run per-netns notifier block chain first, then run the global one. > + * Hopefully, one day, the global one is going to be removed after > + * all notifier block registrators get converted to be per-netns. > + */ Hi Jiri Is that really going to happen? register_netdevice_notifier() is used in 130 files. Do you plan to spend the time to make it happen? > + ret = raw_notifier_call_chain(&net->netdev_chain, val, info); > + if (ret & NOTIFY_STOP_MASK) > + return ret; > return raw_notifier_call_chain(&netdev_chain, val, info); > } Humm. I wonder about NOTIFY_STOP_MASK here. These are two separate chains. Should one chain be able to stop the other chain? Are there other examples where NOTIFY_STOP_MASK crosses a chain boundary? Andrew
Mon, Sep 30, 2019 at 03:38:24PM CEST, andrew@lunn.ch wrote: >> static int call_netdevice_notifiers_info(unsigned long val, >> struct netdev_notifier_info *info) >> { >> + struct net *net = dev_net(info->dev); >> + int ret; >> + >> ASSERT_RTNL(); >> + >> + /* Run per-netns notifier block chain first, then run the global one. >> + * Hopefully, one day, the global one is going to be removed after >> + * all notifier block registrators get converted to be per-netns. >> + */ > >Hi Jiri > >Is that really going to happen? register_netdevice_notifier() is used >in 130 files. Do you plan to spend the time to make it happen? That's why I prepended the sentency with "Hopefully, one day"... > >> + ret = raw_notifier_call_chain(&net->netdev_chain, val, info); >> + if (ret & NOTIFY_STOP_MASK) >> + return ret; >> return raw_notifier_call_chain(&netdev_chain, val, info); >> } > >Humm. I wonder about NOTIFY_STOP_MASK here. These are two separate >chains. Should one chain be able to stop the other chain? Are there Well if the failing item would be in the second chain, at the beginning of it, it would be stopped too. Does not matter where the stop happens, the point is that the whole processing stops. That is why I added the check here. >other examples where NOTIFY_STOP_MASK crosses a chain boundary? Not aware of it, no. Could you please describe what is wrong? > > Andrew
On Mon, Sep 30, 2019 at 04:23:49PM +0200, Jiri Pirko wrote: > Mon, Sep 30, 2019 at 03:38:24PM CEST, andrew@lunn.ch wrote: > >> static int call_netdevice_notifiers_info(unsigned long val, > >> struct netdev_notifier_info *info) > >> { > >> + struct net *net = dev_net(info->dev); > >> + int ret; > >> + > >> ASSERT_RTNL(); > >> + > >> + /* Run per-netns notifier block chain first, then run the global one. > >> + * Hopefully, one day, the global one is going to be removed after > >> + * all notifier block registrators get converted to be per-netns. > >> + */ > > > >Hi Jiri > > > >Is that really going to happen? register_netdevice_notifier() is used > >in 130 files. Do you plan to spend the time to make it happen? > > That's why I prepended the sentency with "Hopefully, one day"... > > > > > >> + ret = raw_notifier_call_chain(&net->netdev_chain, val, info); > >> + if (ret & NOTIFY_STOP_MASK) > >> + return ret; > >> return raw_notifier_call_chain(&netdev_chain, val, info); > >> } > > > >Humm. I wonder about NOTIFY_STOP_MASK here. These are two separate > >chains. Should one chain be able to stop the other chain? Are there > > Well if the failing item would be in the second chain, at the beginning > of it, it would be stopped too. Does not matter where the stop happens, > the point is that the whole processing stops. That is why I added the > check here. > > > >other examples where NOTIFY_STOP_MASK crosses a chain boundary? > > Not aware of it, no. Could you please describe what is wrong? You are expanding the meaning of NOTIFY_STOP_MASK. It now can stop some other chain. If this was one chain with a filter, i would not be asking. But this is two different chains, and one chain can stop another? At minimum, i think this needs to be reviewed by the core kernel people. But i'm also wondering if you are solving the problem at the wrong level. Are there other notifier chains which would benefit from respecting name space boundaries? Would a better solution be to extend struct notifier_block with some sort of filter? Do you have some performance numbers? Where are you getting your performance gains from? By the fact you are doing NOTIFY_STOP_MASK earlier, so preventing a long chain being walked? I notice notifer_block has a priority field. Did you try using that to put your notified earlier on the chain? Andrew
Mon, Sep 30, 2019 at 05:33:43PM CEST, andrew@lunn.ch wrote: >On Mon, Sep 30, 2019 at 04:23:49PM +0200, Jiri Pirko wrote: >> Mon, Sep 30, 2019 at 03:38:24PM CEST, andrew@lunn.ch wrote: >> >> static int call_netdevice_notifiers_info(unsigned long val, >> >> struct netdev_notifier_info *info) >> >> { >> >> + struct net *net = dev_net(info->dev); >> >> + int ret; >> >> + >> >> ASSERT_RTNL(); >> >> + >> >> + /* Run per-netns notifier block chain first, then run the global one. >> >> + * Hopefully, one day, the global one is going to be removed after >> >> + * all notifier block registrators get converted to be per-netns. >> >> + */ >> > >> >Hi Jiri >> > >> >Is that really going to happen? register_netdevice_notifier() is used >> >in 130 files. Do you plan to spend the time to make it happen? >> >> That's why I prepended the sentency with "Hopefully, one day"... >> >> >> > >> >> + ret = raw_notifier_call_chain(&net->netdev_chain, val, info); >> >> + if (ret & NOTIFY_STOP_MASK) >> >> + return ret; >> >> return raw_notifier_call_chain(&netdev_chain, val, info); >> >> } >> > >> >Humm. I wonder about NOTIFY_STOP_MASK here. These are two separate >> >chains. Should one chain be able to stop the other chain? Are there >> >> Well if the failing item would be in the second chain, at the beginning >> of it, it would be stopped too. Does not matter where the stop happens, >> the point is that the whole processing stops. That is why I added the >> check here. >> >> >> >other examples where NOTIFY_STOP_MASK crosses a chain boundary? >> >> Not aware of it, no. Could you please describe what is wrong? > >You are expanding the meaning of NOTIFY_STOP_MASK. It now can stop >some other chain. If this was one chain with a filter, i would not be Well, it was originally a single chain, so the semantics stays intact. Again, it is not some other independent chain. It's just netns one and general one, both serve the same purpose. >asking. But this is two different chains, and one chain can stop >another? At minimum, i think this needs to be reviewed by the core >kernel people. > >But i'm also wondering if you are solving the problem at the wrong >level. Are there other notifier chains which would benefit from >respecting name space boundaries? Would a better solution be to extend >struct notifier_block with some sort of filter? I mentioned my primary motivation in the cover letter. What I want to avoid is need of taking &pernet_ops_rwsem during registration of tne notifier and avoid deadlock in my usecase. Plus it seems very clear that if a notifier knows what netns is he interested in, he just registers in that particular netns chain. Having one fat generic chain with filters is basically what we have right now. > >Do you have some performance numbers? Where are you getting your >performance gains from? By the fact you are doing NOTIFY_STOP_MASK >earlier, so preventing a long chain being walked? I notice >notifer_block has a priority field. Did you try using that to put your >notified earlier on the chain? It is not about stopping the chain earlier, not at all. It is the fact that with many netdevices in many network namespaces you gat a lot of wasted calls to notifiers registators that does not care. > > Andrew
From: Jiri Pirko <jiri@resnulli.us> Date: Mon, 30 Sep 2019 10:15:10 +0200 > From: Jiri Pirko <jiri@mellanox.com> > > Often the code for example in drivers is interested in getting notifier > call only from certain network namespace. In addition to the existing > global netdevice notifier chain introduce per-netns chains and allow > users to register to that. Eventually this would eliminate unnecessary > overhead in case there are many netdevices in many network namespaces. > > Signed-off-by: Jiri Pirko <jiri@mellanox.com> Ok, so there was a discussion about stop semantics. Honestly, I think that's fine. Stop means the operation cannot be performed and whoever is firing off the notifier will have to fail and undo the config change being attempted. In that context, it doesn't matter who or where in the chain we trigger the stop. Given all of that I am pretty sure this change is fine and I will add it to net-next. We can fix any actual semantic problems this might introduce as a follow-on.
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 4f390eec106b..184f54f1b9e1 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2494,6 +2494,9 @@ const char *netdev_cmd_to_name(enum netdev_cmd cmd); int register_netdevice_notifier(struct notifier_block *nb); int unregister_netdevice_notifier(struct notifier_block *nb); +int register_netdevice_notifier_net(struct net *net, struct notifier_block *nb); +int unregister_netdevice_notifier_net(struct net *net, + struct notifier_block *nb); struct netdev_notifier_info { struct net_device *dev; diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index c5a98e03591d..5ac2bb16d4b3 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -36,6 +36,7 @@ #include <linux/ns_common.h> #include <linux/idr.h> #include <linux/skbuff.h> +#include <linux/notifier.h> struct user_namespace; struct proc_dir_entry; @@ -96,6 +97,8 @@ struct net { struct list_head dev_base_head; struct hlist_head *dev_name_head; struct hlist_head *dev_index_head; + struct raw_notifier_head netdev_chain; + unsigned int dev_base_seq; /* protected by rtnl_mutex */ int ifindex; unsigned int dev_unreg_count; diff --git a/net/core/dev.c b/net/core/dev.c index 6a87d0e71201..3302cefd3041 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1766,6 +1766,80 @@ int unregister_netdevice_notifier(struct notifier_block *nb) } EXPORT_SYMBOL(unregister_netdevice_notifier); +/** + * register_netdevice_notifier_net - register a per-netns network notifier block + * @net: network namespace + * @nb: notifier + * + * Register a notifier to be called when network device events occur. + * The notifier passed is linked into the kernel structures and must + * not be reused until it has been unregistered. A negative errno code + * is returned on a failure. + * + * When registered all registration and up events are replayed + * to the new notifier to allow device to have a race free + * view of the network device list. + */ + +int register_netdevice_notifier_net(struct net *net, struct notifier_block *nb) +{ + int err; + + rtnl_lock(); + err = raw_notifier_chain_register(&net->netdev_chain, nb); + if (err) + goto unlock; + if (dev_boot_phase) + goto unlock; + + err = call_netdevice_register_net_notifiers(nb, net); + if (err) + goto chain_unregister; + +unlock: + rtnl_unlock(); + return err; + +chain_unregister: + raw_notifier_chain_unregister(&netdev_chain, nb); + goto unlock; +} +EXPORT_SYMBOL(register_netdevice_notifier_net); + +/** + * unregister_netdevice_notifier_net - unregister a per-netns + * network notifier block + * @net: network namespace + * @nb: notifier + * + * Unregister a notifier previously registered by + * register_netdevice_notifier(). The notifier is unlinked into the + * kernel structures and may then be reused. A negative errno code + * is returned on a failure. + * + * After unregistering unregister and down device events are synthesized + * for all devices on the device list to the removed notifier to remove + * the need for special case cleanup code. + */ + +int unregister_netdevice_notifier_net(struct net *net, + struct notifier_block *nb) +{ + int err; + + rtnl_lock(); + err = raw_notifier_chain_unregister(&net->netdev_chain, nb); + if (err) + goto unlock; + + call_netdevice_unregister_net_notifiers(nb, net); + +unlock: + rtnl_unlock(); + return err; +} +EXPORT_SYMBOL(unregister_netdevice_notifier_net); + /** * call_netdevice_notifiers_info - call all network notifier blocks * @val: value passed unmodified to notifier function @@ -1778,7 +1852,18 @@ EXPORT_SYMBOL(unregister_netdevice_notifier); static int call_netdevice_notifiers_info(unsigned long val, struct netdev_notifier_info *info) { + struct net *net = dev_net(info->dev); + int ret; + ASSERT_RTNL(); + + /* Run per-netns notifier block chain first, then run the global one. + * Hopefully, one day, the global one is going to be removed after + * all notifier block registrators get converted to be per-netns. + */ + ret = raw_notifier_call_chain(&net->netdev_chain, val, info); + if (ret & NOTIFY_STOP_MASK) + return ret; return raw_notifier_call_chain(&netdev_chain, val, info); } @@ -9668,6 +9753,8 @@ static int __net_init netdev_init(struct net *net) if (net->dev_index_head == NULL) goto err_idx; + RAW_INIT_NOTIFIER_HEAD(&net->netdev_chain); + return 0; err_idx: