Patchwork [net-next] vxlan: fix a soft lockup in vxlan module removal

login
register
mail settings
Submitter Amerigo Wang
Date Aug. 7, 2013, 8:43 a.m.
Message ID <1375865002-8732-1-git-send-email-amwang@redhat.com>
Download mbox | patch
Permalink /patch/265411/
State Accepted
Delegated to: David Miller
Headers show

Comments

Amerigo Wang - Aug. 7, 2013, 8:43 a.m.
From: Cong Wang <amwang@redhat.com>

This is a regression introduced by:

	commit fe5c3561e6f0ac7c9546209f01351113c1b77ec8
	Author: stephen hemminger <stephen@networkplumber.org>
	Date:   Sat Jul 13 10:18:18 2013 -0700

	    vxlan: add necessary locking on device removal

The problem is that vxlan_dellink(), which is called with RTNL lock
held, tries to flush the workqueue synchronously, but apparently
igmp_join and igmp_leave work need to hold RTNL lock too, therefore we
have a soft lockup! 

As suggested by Stephen, probably the flush_workqueue can just be
removed and let the normal refcounting work. The workqueue has a
reference to device and socket, therefore the cleanups should work
correctly.

Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Tested-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Cong Wang <amwang@redhat.com>

---
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
stephen hemminger - Aug. 7, 2013, 3:31 p.m.
On Wed,  7 Aug 2013 16:43:22 +0800
Cong Wang <amwang@redhat.com> wrote:

> From: Cong Wang <amwang@redhat.com>
> 
> This is a regression introduced by:
> 
> 	commit fe5c3561e6f0ac7c9546209f01351113c1b77ec8
> 	Author: stephen hemminger <stephen@networkplumber.org>
> 	Date:   Sat Jul 13 10:18:18 2013 -0700
> 
> 	    vxlan: add necessary locking on device removal
> 
> The problem is that vxlan_dellink(), which is called with RTNL lock
> held, tries to flush the workqueue synchronously, but apparently
> igmp_join and igmp_leave work need to hold RTNL lock too, therefore we
> have a soft lockup! 
> 
> As suggested by Stephen, probably the flush_workqueue can just be
> removed and let the normal refcounting work. The workqueue has a
> reference to device and socket, therefore the cleanups should work
> correctly.
> 
> Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
> Cc: Stephen Hemminger <stephen@networkplumber.org>
> Cc: David S. Miller <davem@davemloft.net>
> Tested-by: Cong Wang <amwang@redhat.com>
> Signed-off-by: Cong Wang <amwang@redhat.com>
> 
> ---
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 8bf31d9..c51ef9b 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -1837,8 +1837,6 @@ static void vxlan_dellink(struct net_device *dev, struct list_head *head)
>  	struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
>  	struct vxlan_dev *vxlan = netdev_priv(dev);
>  
> -	flush_workqueue(vxlan_wq);
> -
>  	spin_lock(&vn->sock_lock);
>  	hlist_del_rcu(&vxlan->hlist);
>  	spin_unlock(&vn->sock_lock);

Thanks for testing this.

Acked-by: Stephen Hemminger <stephen@networkplumber.org>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Aug. 9, 2013, 6:42 p.m.
From: Cong Wang <amwang@redhat.com>
Date: Wed,  7 Aug 2013 16:43:22 +0800

> From: Cong Wang <amwang@redhat.com>
> 
> This is a regression introduced by:
> 
> 	commit fe5c3561e6f0ac7c9546209f01351113c1b77ec8
> 	Author: stephen hemminger <stephen@networkplumber.org>
> 	Date:   Sat Jul 13 10:18:18 2013 -0700
> 
> 	    vxlan: add necessary locking on device removal
> 
> The problem is that vxlan_dellink(), which is called with RTNL lock
> held, tries to flush the workqueue synchronously, but apparently
> igmp_join and igmp_leave work need to hold RTNL lock too, therefore we
> have a soft lockup! 
> 
> As suggested by Stephen, probably the flush_workqueue can just be
> removed and let the normal refcounting work. The workqueue has a
> reference to device and socket, therefore the cleanups should work
> correctly.
> 
> Suggested-by: Stephen Hemminger <stephen@networkplumber.org>
> Cc: Stephen Hemminger <stephen@networkplumber.org>
> Cc: David S. Miller <davem@davemloft.net>
> Tested-by: Cong Wang <amwang@redhat.com>
> Signed-off-by: Cong Wang <amwang@redhat.com>

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 8bf31d9..c51ef9b 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1837,8 +1837,6 @@  static void vxlan_dellink(struct net_device *dev, struct list_head *head)
 	struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
 	struct vxlan_dev *vxlan = netdev_priv(dev);
 
-	flush_workqueue(vxlan_wq);
-
 	spin_lock(&vn->sock_lock);
 	hlist_del_rcu(&vxlan->hlist);
 	spin_unlock(&vn->sock_lock);