Message ID | 1397632082-18453-1-git-send-email-zhen-hual@hp.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
On Wed, Apr 16, 2014 at 03:08:02PM +0800, Li, Zhen-Hua wrote: >From: "Li, Zhen-Hua" <zhen-hual@hp.com> > >As netif_running is called in netif_device_attach/detach. There should be >rtnl_lock/unlock called, to avoid dev stat change during netif_device_attach >and detach being called. >I checked NIC some drivers, some of them have netif_device_attach/detach >called between rtnl_lock/unlock, while some drivers do not. It can race with any other thread that takes the lock - i.e. suppose you have a driver that doesn't take the lock and calls netif_device_attach(), while another thread (completely unrelated to the issue) holds rtnl_lock - this way the trylock will return false, the thread that took rtnl releases it - and you'll see the exact same behaviour as without your patch. I'm not sure about the issue you're trying to fix here - there might be a better approach which I'm not aware of, however with your approach you should really either remove the rtnl locking from all drivers that use this function (and insert a normal rtnl_lock here) or, vice-versa, add it to all drivers and add an ASSERT_RTNL to netif_device_detach/attach. > >This patch is tring to find a generic way to fix this for all NIC drivers. > >Signed-off-by: Li, Zhen-Hua <zhen-hual@hp.com> >--- > net/core/dev.c | 18 ++++++++++++++++++ > 1 file changed, 18 insertions(+) > >diff --git a/net/core/dev.c b/net/core/dev.c >index 5b3042e..795bbc5 100644 >--- a/net/core/dev.c >+++ b/net/core/dev.c >@@ -2190,10 +2190,19 @@ EXPORT_SYMBOL(__dev_kfree_skb_any); > */ > void netif_device_detach(struct net_device *dev) > { >+ /** >+ * As netif_running is called , rtnl_lock and unlock are needed to >+ * avoid __LINK_STATE_START bit changes during this function call. >+ */ >+ int need_unlock; >+ >+ need_unlock = rtnl_trylock(); > if (test_and_clear_bit(__LINK_STATE_PRESENT, &dev->state) && > netif_running(dev)) { > netif_tx_stop_all_queues(dev); > } >+ if (need_unlock) >+ rtnl_unlock(); > } > EXPORT_SYMBOL(netif_device_detach); > >@@ -2205,11 +2214,20 @@ EXPORT_SYMBOL(netif_device_detach); > */ > void netif_device_attach(struct net_device *dev) > { >+ /** >+ * As netif_running is called , rtnl_lock and unlock are needed to >+ * avoid __LINK_STATE_START bit changes during this function call. >+ */ >+ int need_unlock; >+ >+ need_unlock = rtnl_trylock(); > if (!test_and_set_bit(__LINK_STATE_PRESENT, &dev->state) && > netif_running(dev)) { > netif_tx_wake_all_queues(dev); > __netdev_watchdog_up(dev); > } >+ if (need_unlock) >+ rtnl_unlock(); > } > EXPORT_SYMBOL(netif_device_attach); > >-- >1.7.10.4 > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
The problem I am trying to fix is: when netif_device_attach/detached is called, it get a return value from netif_running, but at this moment, in another thread, the stat of this dev changes. But in netif_device_attach, it does not know stat changed, and this may cause bugs. I think you are right, this patch cannot fix race with another thread that takes the lock. But that's what is happening now(with out this patch). I do not yet find a way to fix it completely. And another problem is: we only need a lock for this dev , not full all dev. So how about adding a single lock for each net device? Regards Zhenhua On 04/16/2014 03:38 PM, Veaceslav Falico wrote: > On Wed, Apr 16, 2014 at 03:08:02PM +0800, Li, Zhen-Hua wrote: >> From: "Li, Zhen-Hua" <zhen-hual@hp.com> >> >> As netif_running is called in netif_device_attach/detach. There should be >> rtnl_lock/unlock called, to avoid dev stat change during >> netif_device_attach >> and detach being called. >> I checked NIC some drivers, some of them have netif_device_attach/detach >> called between rtnl_lock/unlock, while some drivers do not. > > It can race with any other thread that takes the lock - i.e. suppose you > have a driver that doesn't take the lock and calls netif_device_attach(), > while another thread (completely unrelated to the issue) holds rtnl_lock - > this way the trylock will return false, the thread that took rtnl releases > it - and you'll see the exact same behaviour as without your patch. > > I'm not sure about the issue you're trying to fix here - there might be a > better approach which I'm not aware of, however with your approach you > should really either remove the rtnl locking from all drivers that use this > function (and insert a normal rtnl_lock here) or, vice-versa, add it to all > drivers and add an ASSERT_RTNL to netif_device_detach/attach. > >> >> This patch is tring to find a generic way to fix this for all NIC >> drivers. >> >> Signed-off-by: Li, Zhen-Hua <zhen-hual@hp.com> >> --- >> net/core/dev.c | 18 ++++++++++++++++++ >> 1 file changed, 18 insertions(+) >> >> diff --git a/net/core/dev.c b/net/core/dev.c >> index 5b3042e..795bbc5 100644 >> --- a/net/core/dev.c >> +++ b/net/core/dev.c >> @@ -2190,10 +2190,19 @@ EXPORT_SYMBOL(__dev_kfree_skb_any); >> */ >> void netif_device_detach(struct net_device *dev) >> { >> + /** >> + * As netif_running is called , rtnl_lock and unlock are needed to >> + * avoid __LINK_STATE_START bit changes during this function call. >> + */ >> + int need_unlock; >> + >> + need_unlock = rtnl_trylock(); >> if (test_and_clear_bit(__LINK_STATE_PRESENT, &dev->state) && >> netif_running(dev)) { >> netif_tx_stop_all_queues(dev); >> } >> + if (need_unlock) >> + rtnl_unlock(); >> } >> EXPORT_SYMBOL(netif_device_detach); >> >> @@ -2205,11 +2214,20 @@ EXPORT_SYMBOL(netif_device_detach); >> */ >> void netif_device_attach(struct net_device *dev) >> { >> + /** >> + * As netif_running is called , rtnl_lock and unlock are needed to >> + * avoid __LINK_STATE_START bit changes during this function call. >> + */ >> + int need_unlock; >> + >> + need_unlock = rtnl_trylock(); >> if (!test_and_set_bit(__LINK_STATE_PRESENT, &dev->state) && >> netif_running(dev)) { >> netif_tx_wake_all_queues(dev); >> __netdev_watchdog_up(dev); >> } >> + if (need_unlock) >> + rtnl_unlock(); >> } >> EXPORT_SYMBOL(netif_device_attach); >> >> -- >> 1.7.10.4 >> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello. On 04/16/2014 11:08 AM, Li, Zhen-Hua wrote: > From: "Li, Zhen-Hua" <zhen-hual@hp.com> > As netif_running is called in netif_device_attach/detach. There should be > rtnl_lock/unlock called, to avoid dev stat change during netif_device_attach > and detach being called. > I checked NIC some drivers, some of them have netif_device_attach/detach > called between rtnl_lock/unlock, while some drivers do not. > This patch is tring to find a generic way to fix this for all NIC drivers. > Signed-off-by: Li, Zhen-Hua <zhen-hual@hp.com> > --- > net/core/dev.c | 18 ++++++++++++++++++ > 1 file changed, 18 insertions(+) > diff --git a/net/core/dev.c b/net/core/dev.c > index 5b3042e..795bbc5 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -2190,10 +2190,19 @@ EXPORT_SYMBOL(__dev_kfree_skb_any); > */ > void netif_device_detach(struct net_device *dev) > { > + /** Hm, why kernel-doc style comment here? > + * As netif_running is called , rtnl_lock and unlock are needed to > + * avoid __LINK_STATE_START bit changes during this function call. > + */ > + int need_unlock; > + > + need_unlock = rtnl_trylock(); > if (test_and_clear_bit(__LINK_STATE_PRESENT, &dev->state) && > netif_running(dev)) { > netif_tx_stop_all_queues(dev); > } > + if (need_unlock) > + rtnl_unlock(); > } > EXPORT_SYMBOL(netif_device_detach); > > @@ -2205,11 +2214,20 @@ EXPORT_SYMBOL(netif_device_detach); > */ > void netif_device_attach(struct net_device *dev) > { > + /** ... and here? > + * As netif_running is called , rtnl_lock and unlock are needed to > + * avoid __LINK_STATE_START bit changes during this function call. > + */ WBR, Sergei -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
The comment is trying to explain why add a lock here. On 04/19/2014 03:01 AM, Sergei Shtylyov wrote: > Hello. > > On 04/16/2014 11:08 AM, Li, Zhen-Hua wrote: > >> From: "Li, Zhen-Hua" <zhen-hual@hp.com> > >> As netif_running is called in netif_device_attach/detach. There >> should be >> rtnl_lock/unlock called, to avoid dev stat change during >> netif_device_attach >> and detach being called. >> I checked NIC some drivers, some of them have >> netif_device_attach/detach >> called between rtnl_lock/unlock, while some drivers do not. > >> This patch is tring to find a generic way to fix this for all NIC >> drivers. > >> Signed-off-by: Li, Zhen-Hua <zhen-hual@hp.com> >> --- >> net/core/dev.c | 18 ++++++++++++++++++ >> 1 file changed, 18 insertions(+) > >> diff --git a/net/core/dev.c b/net/core/dev.c >> index 5b3042e..795bbc5 100644 >> --- a/net/core/dev.c >> +++ b/net/core/dev.c >> @@ -2190,10 +2190,19 @@ EXPORT_SYMBOL(__dev_kfree_skb_any); >> */ >> void netif_device_detach(struct net_device *dev) >> { >> + /** > > Hm, why kernel-doc style comment here? > >> + * As netif_running is called , rtnl_lock and unlock are needed to >> + * avoid __LINK_STATE_START bit changes during this function call. >> + */ >> + int need_unlock; >> + >> + need_unlock = rtnl_trylock(); >> if (test_and_clear_bit(__LINK_STATE_PRESENT, &dev->state) && >> netif_running(dev)) { >> netif_tx_stop_all_queues(dev); >> } >> + if (need_unlock) >> + rtnl_unlock(); >> } >> EXPORT_SYMBOL(netif_device_detach); >> >> @@ -2205,11 +2214,20 @@ EXPORT_SYMBOL(netif_device_detach); >> */ >> void netif_device_attach(struct net_device *dev) >> { >> + /** > > ... and here? > >> + * As netif_running is called , rtnl_lock and unlock are needed to >> + * avoid __LINK_STATE_START bit changes during this function call. >> + */ > > WBR, Sergei > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello. On 21-04-2014 10:30, Li, ZhenHua wrote: > The comment is trying to explain why add a lock here. I can read, thanks. :-) I was wondering about the kernel-doc comment style you've used; AFAIK, it's only good for documenting functions and data structures. The normal multi-line comment style in the networking code is this: /* bla * bla */ >>> From: "Li, Zhen-Hua" <zhen-hual@hp.com> >>> As netif_running is called in netif_device_attach/detach. There should be >>> rtnl_lock/unlock called, to avoid dev stat change during netif_device_attach >>> and detach being called. >>> I checked NIC some drivers, some of them have netif_device_attach/detach >>> called between rtnl_lock/unlock, while some drivers do not. >>> This patch is tring to find a generic way to fix this for all NIC drivers. >>> Signed-off-by: Li, Zhen-Hua <zhen-hual@hp.com> >>> --- >>> net/core/dev.c | 18 ++++++++++++++++++ >>> 1 file changed, 18 insertions(+) >>> diff --git a/net/core/dev.c b/net/core/dev.c >>> index 5b3042e..795bbc5 100644 >>> --- a/net/core/dev.c >>> +++ b/net/core/dev.c >>> @@ -2190,10 +2190,19 @@ EXPORT_SYMBOL(__dev_kfree_skb_any); >>> */ >>> void netif_device_detach(struct net_device *dev) >>> { >>> + /** >> Hm, why kernel-doc style comment here? >>> + * As netif_running is called , rtnl_lock and unlock are needed to Space before comma not needed. >>> + * avoid __LINK_STATE_START bit changes during this function call. >>> + */ >>> + int need_unlock; >>> + >>> + need_unlock = rtnl_trylock(); >>> if (test_and_clear_bit(__LINK_STATE_PRESENT, &dev->state) && >>> netif_running(dev)) { >>> netif_tx_stop_all_queues(dev); >>> } >>> + if (need_unlock) >>> + rtnl_unlock(); >>> } >>> EXPORT_SYMBOL(netif_device_detach); >>> >>> @@ -2205,11 +2214,20 @@ EXPORT_SYMBOL(netif_device_detach); >>> */ >>> void netif_device_attach(struct net_device *dev) >>> { >>> + /** >> ... and here? >>> + * As netif_running is called , rtnl_lock and unlock are needed to Space before comma not needed. >>> + * avoid __LINK_STATE_START bit changes during this function call. >>> + */ WBR, Sergei -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2014-04-16 at 15:08 +0800, Li, Zhen-Hua wrote: > From: "Li, Zhen-Hua" <zhen-hual@hp.com> > > As netif_running is called in netif_device_attach/detach. There should be > rtnl_lock/unlock called, to avoid dev stat change during netif_device_attach > and detach being called. > I checked NIC some drivers, some of them have netif_device_attach/detach > called between rtnl_lock/unlock, while some drivers do not. > > This patch is tring to find a generic way to fix this for all NIC drivers. I don't think you can generically use the RTNL lock for this. > Signed-off-by: Li, Zhen-Hua <zhen-hual@hp.com> > --- > net/core/dev.c | 18 ++++++++++++++++++ > 1 file changed, 18 insertions(+) > > diff --git a/net/core/dev.c b/net/core/dev.c > index 5b3042e..795bbc5 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -2190,10 +2190,19 @@ EXPORT_SYMBOL(__dev_kfree_skb_any); > */ > void netif_device_detach(struct net_device *dev) > { > + /** > + * As netif_running is called , rtnl_lock and unlock are needed to > + * avoid __LINK_STATE_START bit changes during this function call. > + */ > + int need_unlock; > + > + need_unlock = rtnl_trylock(); It is never correct to use trylock and then continue even if it fails. I think you're trying to simulate a reentrant mutex but this will fail if *any* task already has the mutex. Furthermore it is currently allowed and useful to call these functions from atomic context (transmit or completion path) where it is not possible to hold the mutex. > if (test_and_clear_bit(__LINK_STATE_PRESENT, &dev->state) && > netif_running(dev)) { > netif_tx_stop_all_queues(dev); > } > + if (need_unlock) > + rtnl_unlock(); > } > EXPORT_SYMBOL(netif_device_detach); For netif_device_detach(), I wonder whether it is necessary to check netif_running(). What are we trying to avoid? > @@ -2205,11 +2214,20 @@ EXPORT_SYMBOL(netif_device_detach); > */ > void netif_device_attach(struct net_device *dev) > { > + /** > + * As netif_running is called , rtnl_lock and unlock are needed to > + * avoid __LINK_STATE_START bit changes during this function call. > + */ > + int need_unlock; > + > + need_unlock = rtnl_trylock(); > if (!test_and_set_bit(__LINK_STATE_PRESENT, &dev->state) && > netif_running(dev)) { > netif_tx_wake_all_queues(dev); > __netdev_watchdog_up(dev); > } > + if (need_unlock) > + rtnl_unlock(); > } > EXPORT_SYMBOL(netif_device_attach); I do see a problem if netif_device_detach() races with dev_deactivate_many(), which is being mitigated but not avoided by the test of netif_running(). I think a proper solution is going to involve changing dev_deactivate_many() as well, removing the use of netif_running(), and possible using cmpxchg() to atomically manipulate multiple bits of dev->state. Ben.
diff --git a/net/core/dev.c b/net/core/dev.c index 5b3042e..795bbc5 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2190,10 +2190,19 @@ EXPORT_SYMBOL(__dev_kfree_skb_any); */ void netif_device_detach(struct net_device *dev) { + /** + * As netif_running is called , rtnl_lock and unlock are needed to + * avoid __LINK_STATE_START bit changes during this function call. + */ + int need_unlock; + + need_unlock = rtnl_trylock(); if (test_and_clear_bit(__LINK_STATE_PRESENT, &dev->state) && netif_running(dev)) { netif_tx_stop_all_queues(dev); } + if (need_unlock) + rtnl_unlock(); } EXPORT_SYMBOL(netif_device_detach); @@ -2205,11 +2214,20 @@ EXPORT_SYMBOL(netif_device_detach); */ void netif_device_attach(struct net_device *dev) { + /** + * As netif_running is called , rtnl_lock and unlock are needed to + * avoid __LINK_STATE_START bit changes during this function call. + */ + int need_unlock; + + need_unlock = rtnl_trylock(); if (!test_and_set_bit(__LINK_STATE_PRESENT, &dev->state) && netif_running(dev)) { netif_tx_wake_all_queues(dev); __netdev_watchdog_up(dev); } + if (need_unlock) + rtnl_unlock(); } EXPORT_SYMBOL(netif_device_attach);