Message ID | 1472793514-31850-1-git-send-email-mahesh@bandewar.net |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
Fri, Sep 02, 2016 at 07:18:34AM CEST, mahesh@bandewar.net wrote: >From: Mahesh Bandewar <maheshb@google.com> > >Following few steps will crash kernel - > > (a) Create bonding master > > modprobe bonding miimon=50 > (b) Create macvlan bridge on eth2 > > ip link add link eth2 dev mvl0 address aa:0:0:0:0:01 \ > type macvlan > (c) Now try adding eth2 into the bond > > echo +eth2 > /sys/class/net/bond0/bonding/slaves > <crash> > >Bonding does lots of things before checking if the device enslaved is >busy or not. > >In this case when the notifier call-chain sends notifications, the >bond_netdev_event() assumes that the rx_handler /rx_handler_data is >registered while the bond_enslave() hasn't progressed far enough to >register rx_handler for the new slave. > >This patch adds a rx_handler check that can be performed right at the >beginning of the enslave code to avoid getting into this situation. > >Signed-off-by: Mahesh Bandewar <maheshb@google.com> >--- > drivers/net/bonding/bond_main.c | 7 ++++--- > include/linux/netdevice.h | 1 + > net/core/dev.c | 16 ++++++++++++++++ > 3 files changed, 21 insertions(+), 3 deletions(-) > >diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c >index 217e8da0628c..9599ed6f1213 100644 >--- a/drivers/net/bonding/bond_main.c >+++ b/drivers/net/bonding/bond_main.c >@@ -1341,9 +1341,10 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev) > slave_dev->name); > } > >- /* already enslaved */ >- if (slave_dev->flags & IFF_SLAVE) { >- netdev_dbg(bond_dev, "Error: Device was already enslaved\n"); >+ /* already in-use? */ >+ if (netdev_is_rx_handler_busy(slave_dev)) { >+ netdev_err(bond_dev, >+ "Error: Device is in use and cannot be enslaved\n"); > return -EBUSY; > } > >diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >index d122be9345c7..99ba59a7b114 100644 >--- a/include/linux/netdevice.h >+++ b/include/linux/netdevice.h >@@ -3266,6 +3266,7 @@ static inline void napi_free_frags(struct napi_struct *napi) > napi->skb = NULL; > } > >+bool netdev_is_rx_handler_busy(struct net_device *dev); > int netdev_rx_handler_register(struct net_device *dev, > rx_handler_func_t *rx_handler, > void *rx_handler_data); >diff --git a/net/core/dev.c b/net/core/dev.c >index 34b5322bc081..81e6f7298122 100644 >--- a/net/core/dev.c >+++ b/net/core/dev.c >@@ -3965,6 +3965,22 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret, > } > > /** >+ * netdev_is_rx_handler_busy - check if receive handler is registered >+ * @dev: device to check >+ * >+ * Check if a receive handler is already registered for a given device. >+ * Return true if there one. >+ * >+ * The caller must hold the rtnl_mutex. >+ */ >+bool netdev_is_rx_handler_busy(struct net_device *dev) >+{ >+ ASSERT_RTNL(); >+ return dev && rtnl_dereference(dev->rx_handler); >+} >+EXPORT_SYMBOL_GPL(netdev_is_rx_handler_busy); No, please, don't make bonding a spacial citizen introducing this. Please handle the issue inside the bonding code, like we do for the rest of master devices (and how it was once done for bonding). Thanks.
On Fri, 2016-09-02 at 09:52 +0200, Jiri Pirko wrote: > > No, please, don't make bonding a spacial citizen introducing this. > Please handle the issue inside the bonding code, like we do for the rest > of master devices (and how it was once done for bonding). Thanks. I do not feel this netdev_is_rx_handler_busy() use is special to bonding. It makes sense to use it early, to avoid complex rollback, once various events have been sent all over. bond_enslave() is 447 lines long already. You perfectly know how hard it is to 'handle the issue inside the bonding code' as you chose to not fix bonding and write team instead. So Mahesh patch makes perfect sense to me. It exactly fixes a stupid bond_enslave() behavior, trying to set rx_handler way too late. By doing sanity checks before any action, we simply do not have to add complex rollback. Acked-by: Eric Dumazet <edumazet@google.com> Thanks.
On 9/1/16 11:18 PM, Mahesh Bandewar wrote: > From: Mahesh Bandewar <maheshb@google.com> > > Following few steps will crash kernel - > > (a) Create bonding master > > modprobe bonding miimon=50 > (b) Create macvlan bridge on eth2 > > ip link add link eth2 dev mvl0 address aa:0:0:0:0:01 \ > type macvlan > (c) Now try adding eth2 into the bond > > echo +eth2 > /sys/class/net/bond0/bonding/slaves > <crash> > > Bonding does lots of things before checking if the device enslaved is > busy or not. > > In this case when the notifier call-chain sends notifications, the > bond_netdev_event() assumes that the rx_handler /rx_handler_data is > registered while the bond_enslave() hasn't progressed far enough to > register rx_handler for the new slave. > > This patch adds a rx_handler check that can be performed right at the > beginning of the enslave code to avoid getting into this situation. > > Signed-off-by: Mahesh Bandewar <maheshb@google.com> > --- > drivers/net/bonding/bond_main.c | 7 ++++--- > include/linux/netdevice.h | 1 + > net/core/dev.c | 16 ++++++++++++++++ > 3 files changed, 21 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c > index 217e8da0628c..9599ed6f1213 100644 > --- a/drivers/net/bonding/bond_main.c > +++ b/drivers/net/bonding/bond_main.c > @@ -1341,9 +1341,10 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev) > slave_dev->name); > } > > - /* already enslaved */ > - if (slave_dev->flags & IFF_SLAVE) { > - netdev_dbg(bond_dev, "Error: Device was already enslaved\n"); > + /* already in-use? */ > + if (netdev_is_rx_handler_busy(slave_dev)) { > + netdev_err(bond_dev, > + "Error: Device is in use and cannot be enslaved\n"); > return -EBUSY; > } > This check duplicates what netdev_rx_handler_register does. Why not move the call to netdev_rx_handler_register here and then call unregister on failure paths?
On Fri, 2016-09-02 at 08:30 -0600, David Ahern wrote: > This check duplicates what netdev_rx_handler_register does. Why not > move the call to netdev_rx_handler_register here and then call > unregister on failure paths? As soon as you call netdev_rx_handler_register(), incoming packets will hit your driver and we'll likely crash since the enslaving is not done yet. Really think about RCU, we do rcu_assign_pointer() because we want all prior changes being committed to memory before 'enabling' readers to see the updated rcu protected pointer. There are 9 call sites where netdev_rx_handler_register() is used. We can get rid of the extra check in netdev_rx_handler_register() once all of them are using netdev_is_rx_handler_busy() Since this patch takes care of bonding only, we need to keep the existing check in netdev_rx_handler_register() Anyway, we are speaking of control function, an extra check is simply safer, like all the ASSERT_RTNL() we do have...
On 9/2/16 8:45 AM, Eric Dumazet wrote: > On Fri, 2016-09-02 at 08:30 -0600, David Ahern wrote: > >> This check duplicates what netdev_rx_handler_register does. Why not >> move the call to netdev_rx_handler_register here and then call >> unregister on failure paths? > > As soon as you call netdev_rx_handler_register(), incoming packets will > hit your driver and we'll likely crash since the enslaving is not done > yet. > > Really think about RCU, we do rcu_assign_pointer() because we want all > prior changes being committed to memory before 'enabling' readers to see > the updated rcu protected pointer. > > There are 9 call sites where netdev_rx_handler_register() is used. > > We can get rid of the extra check in netdev_rx_handler_register() once > all of them are using netdev_is_rx_handler_busy() > > Since this patch takes care of bonding only, we need to keep the > existing check in netdev_rx_handler_register() > > Anyway, we are speaking of control function, an extra check is simply > safer, like all the ASSERT_RTNL() we do have... > I hit this same problem yesterday but with the bridge. I forgot I had a macvlan device on an interface and tried to enslave it to a bridge. It failed with EBUSY without crashing the kernel so it is one example that handles the conflict, and the bridge also calls the register before the enslaving is done.
On Fri, 2016-09-02 at 08:57 -0600, David Ahern wrote: > I hit this same problem yesterday but with the bridge. I forgot I had > a macvlan device on an interface and tried to enslave it to a bridge. > It failed with EBUSY without crashing the kernel so it is one example > that handles the conflict, and the bridge also calls the register > before the enslaving is done. > Sure, some netdev_rx_handler_register() users are simpler than bonding, this was the point Jiri raised. Hey guys just use team instead of bonding ;) If you carefully look at bridge code, you'll find other kind of errors for sure ;) For example, should dev_disable_lro() be called before or after netdev_rx_handler_register() ? ;) Mahesh proposal allows for simplification, since one level can be removed from the error rollback chain. Ideally the netdev_rx_handler_register() should be called only when no further errors can be detected in the 'enslaving' process, otherwise some live packets could come and be incorrectly processed/dropped by a not fully initialized driver. So it is a chicken and egg problem, if you allow netdev_rx_handler_register() to return an error, while it is so easy to early check what could go wrong with it.
On Fri, 2016-09-02 at 08:21 -0700, Eric Dumazet wrote: > Ideally the netdev_rx_handler_register() should be called only when no > further errors can be detected in the 'enslaving' process, otherwise > some live packets could come and be incorrectly processed/dropped by a > not fully initialized driver. > > So it is a chicken and egg problem, if you allow > netdev_rx_handler_register() to return an error, while it is so easy > to early check what could go wrong with it. One final point is that this patch is easy to backport to stable versions. bond_enslave() is a monster, and changing the rollback chain is likely to cause backport merge conflicts.
Fri, Sep 02, 2016 at 03:33:20PM CEST, eric.dumazet@gmail.com wrote: >On Fri, 2016-09-02 at 09:52 +0200, Jiri Pirko wrote: > >> >> No, please, don't make bonding a spacial citizen introducing this. >> Please handle the issue inside the bonding code, like we do for the rest >> of master devices (and how it was once done for bonding). Thanks. > >I do not feel this netdev_is_rx_handler_busy() use is special to >bonding. > >It makes sense to use it early, to avoid complex rollback, once various >events have been sent all over. > >bond_enslave() is 447 lines long already. > >You perfectly know how hard it is to 'handle the issue inside the >bonding code' as you chose to not fix bonding and write team instead. > >So Mahesh patch makes perfect sense to me. It exactly fixes a stupid >bond_enslave() behavior, trying to set rx_handler way too late. > >By doing sanity checks before any action, we simply do not have to add >complex rollback. Understand the reason. All I say is we tried hard (I surely did) in the past to remove bonding specific quirks and now we are adding one. And the simple reason is that the code is such a mess. Just use team instead and you'll be fine. You can "google" it :)
On Fri, 2016-09-02 at 18:25 +0200, Jiri Pirko wrote: > Understand the reason. All I say is we tried hard (I surely did) in > the past to remove bonding specific quirks and now we are adding one. > And the simple reason is that the code is such a mess. > > Just use team instead and you'll be fine. You can "google" it :) Sure, please join _our_ team and make all the needed changes in user land. The kernel part of it is epsilon compared to all the control part (ie talk to the various Google switches.), and monitoring. I am fine to leave the bug in upstream bonding driver, if you really want to force people out of bonding land ! Here an automated test simply used macvlan and did not remove the macvlan before enslaving the netdev again in a bonding, we already fixed the test, because it is faster than deploying new kernels anyway.
On Fri, Sep 2, 2016 at 9:37 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Fri, 2016-09-02 at 18:25 +0200, Jiri Pirko wrote: > >> Understand the reason. All I say is we tried hard (I surely did) in >> the past to remove bonding specific quirks and now we are adding one. >> And the simple reason is that the code is such a mess. >> I would not qualify this as bonding specific change! The check of you can really use the interface is simple enough, and can be used by other virtual drivers (e.g. macvlan, ipvlan, vrf etc.). Waiting until rx_handler_register() can be called in your code and then try to do rollback is bad. Also doing register() early has it's own consequences. On contrary, I would argue that this is much cleaner. Agreed that bonding-driver has become a monster but that doesn't mean we should not fix issues. This argument would be valid when we are dealing with last couple installations / use cases of bonding and everyone else has moved to alternatives. Unfortunately we are not there yet :( >> Just use team instead and you'll be fine. You can "google" it :) > > Sure, please join _our_ team and make all the needed changes in user > land. > :) Cannot put it more eloquently than Eric did but we would (in theory) love to move to team-driver, but logistics don't support this (yet!). > The kernel part of it is epsilon compared to all the control part (ie > talk to the various Google switches.), and monitoring. > > I am fine to leave the bug in upstream bonding driver, if you really > want to force people out of bonding land ! > > Here an automated test simply used macvlan and did not remove the > macvlan before enslaving the netdev again in a bonding, we already fixed > the test, because it is faster than deploying new kernels anyway. > > >
Fri, Sep 02, 2016 at 07:08:54PM CEST, maheshb@google.com wrote: >On Fri, Sep 2, 2016 at 9:37 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: >> On Fri, 2016-09-02 at 18:25 +0200, Jiri Pirko wrote: >> >>> Understand the reason. All I say is we tried hard (I surely did) in >>> the past to remove bonding specific quirks and now we are adding one. >>> And the simple reason is that the code is such a mess. >>> >I would not qualify this as bonding specific change! The check of you can >really use the interface is simple enough, and can be used by other virtual >drivers (e.g. macvlan, ipvlan, vrf etc.). Waiting until >rx_handler_register() can be >called in your code and then try to do rollback is bad. Also doing >register() early >has it's own consequences. On contrary, I would argue that this is much cleaner. Fair enough. Would be great to do this in all other rx_handler users so we are consistent. Thanks! > >Agreed that bonding-driver has become a monster but that doesn't mean we >should not fix issues. This argument would be valid when we are >dealing with last >couple installations / use cases of bonding and everyone else has moved to >alternatives. Unfortunately we are not there yet :( > >>> Just use team instead and you'll be fine. You can "google" it :) >> >> Sure, please join _our_ team and make all the needed changes in user >> land. >> >:) Cannot put it more eloquently than Eric did but we would (in >theory) love to move >to team-driver, but logistics don't support this (yet!). > >> The kernel part of it is epsilon compared to all the control part (ie >> talk to the various Google switches.), and monitoring. >> >> I am fine to leave the bug in upstream bonding driver, if you really >> want to force people out of bonding land ! >> >> Here an automated test simply used macvlan and did not remove the >> macvlan before enslaving the netdev again in a bonding, we already fixed >> the test, because it is faster than deploying new kernels anyway. >> >> >>
From: Mahesh Bandewar <mahesh@bandewar.net> Date: Thu, 1 Sep 2016 22:18:34 -0700 > From: Mahesh Bandewar <maheshb@google.com> > > Following few steps will crash kernel - > > (a) Create bonding master > > modprobe bonding miimon=50 > (b) Create macvlan bridge on eth2 > > ip link add link eth2 dev mvl0 address aa:0:0:0:0:01 \ > type macvlan > (c) Now try adding eth2 into the bond > > echo +eth2 > /sys/class/net/bond0/bonding/slaves > <crash> > > Bonding does lots of things before checking if the device enslaved is > busy or not. > > In this case when the notifier call-chain sends notifications, the > bond_netdev_event() assumes that the rx_handler /rx_handler_data is > registered while the bond_enslave() hasn't progressed far enough to > register rx_handler for the new slave. > > This patch adds a rx_handler check that can be performed right at the > beginning of the enslave code to avoid getting into this situation. > > Signed-off-by: Mahesh Bandewar <maheshb@google.com> Applied and queued up for -stable, thanks.
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 217e8da0628c..9599ed6f1213 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1341,9 +1341,10 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev) slave_dev->name); } - /* already enslaved */ - if (slave_dev->flags & IFF_SLAVE) { - netdev_dbg(bond_dev, "Error: Device was already enslaved\n"); + /* already in-use? */ + if (netdev_is_rx_handler_busy(slave_dev)) { + netdev_err(bond_dev, + "Error: Device is in use and cannot be enslaved\n"); return -EBUSY; } diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index d122be9345c7..99ba59a7b114 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3266,6 +3266,7 @@ static inline void napi_free_frags(struct napi_struct *napi) napi->skb = NULL; } +bool netdev_is_rx_handler_busy(struct net_device *dev); int netdev_rx_handler_register(struct net_device *dev, rx_handler_func_t *rx_handler, void *rx_handler_data); diff --git a/net/core/dev.c b/net/core/dev.c index 34b5322bc081..81e6f7298122 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3965,6 +3965,22 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret, } /** + * netdev_is_rx_handler_busy - check if receive handler is registered + * @dev: device to check + * + * Check if a receive handler is already registered for a given device. + * Return true if there one. + * + * The caller must hold the rtnl_mutex. + */ +bool netdev_is_rx_handler_busy(struct net_device *dev) +{ + ASSERT_RTNL(); + return dev && rtnl_dereference(dev->rx_handler); +} +EXPORT_SYMBOL_GPL(netdev_is_rx_handler_busy); + +/** * netdev_rx_handler_register - register receive handler * @dev: device to register a handler for * @rx_handler: receive handler to register