Patchwork [net] vxlan: add necessary locking on device removal

login
register
mail settings
Submitter stephen hemminger
Date July 13, 2013, 5:18 p.m.
Message ID <20130713101818.0d4fa8e0@nehalam.linuxnetplumber.net>
Download mbox | patch
Permalink /patch/258836/
State Accepted
Delegated to: David Miller
Headers show

Comments

stephen hemminger - July 13, 2013, 5:18 p.m.
The socket management is now done in workqueue (outside of RTNL)
and protected by vn->sock_lock. There were two possible bugs, first
the vxlan device was removed from the VNI hash table per socket without
holding lock. And there was a race when device is created and the workqueue
could run after deletion.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pravin B Shelar - July 13, 2013, 7:21 p.m.
On Sat, Jul 13, 2013 at 10:18 AM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> The socket management is now done in workqueue (outside of RTNL)
> and protected by vn->sock_lock. There were two possible bugs, first
> the vxlan device was removed from the VNI hash table per socket without
> holding lock. And there was a race when device is created and the workqueue
> could run after deletion.
>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>
> --- a/drivers/net/vxlan.c       2013-07-08 16:31:50.080744429 -0700
> +++ b/drivers/net/vxlan.c       2013-07-10 20:15:47.337653899 -0700
> @@ -1767,9 +1767,15 @@ static int vxlan_newlink(struct net *net
>
>  static void vxlan_dellink(struct net_device *dev, struct list_head *head)
>  {
> +       struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
>         struct vxlan_dev *vxlan = netdev_priv(dev);
>
> +       flush_workqueue(vxlan_wq);
> +
Doesn't this create dependency on sock_work thread while holding RTNL.
If so it can result in deadlock.

> +       spin_lock(&vn->sock_lock);
>         hlist_del_rcu(&vxlan->hlist);
> +       spin_unlock(&vn->sock_lock);
> +
>         list_del(&vxlan->next);
>         unregister_netdevice_queue(dev, head);
>  }
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - July 16, 2013, 6:28 p.m.
From: Pravin Shelar <pshelar@nicira.com>
Date: Sat, 13 Jul 2013 12:21:52 -0700

> On Sat, Jul 13, 2013 at 10:18 AM, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
>> The socket management is now done in workqueue (outside of RTNL)
>> and protected by vn->sock_lock. There were two possible bugs, first
>> the vxlan device was removed from the VNI hash table per socket without
>> holding lock. And there was a race when device is created and the workqueue
>> could run after deletion.
>>
>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>>
>> --- a/drivers/net/vxlan.c       2013-07-08 16:31:50.080744429 -0700
>> +++ b/drivers/net/vxlan.c       2013-07-10 20:15:47.337653899 -0700
>> @@ -1767,9 +1767,15 @@ static int vxlan_newlink(struct net *net
>>
>>  static void vxlan_dellink(struct net_device *dev, struct list_head *head)
>>  {
>> +       struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
>>         struct vxlan_dev *vxlan = netdev_priv(dev);
>>
>> +       flush_workqueue(vxlan_wq);
>> +
> Doesn't this create dependency on sock_work thread while holding RTNL.
> If so it can result in deadlock.

What exact deadlock do you perceive?  I don't see any code path in the
sock_work handler (vxlan_sock_work) which takes the RTNL mutex.

So we should be able to safely flush any pending sock_work jobs from
vxlan_dellink().  The fact that vxlan_dellink() runs with the RTNL
mutex shouldn't cause any issues.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pravin B Shelar - July 16, 2013, 8:29 p.m.
On Tue, Jul 16, 2013 at 11:28 AM, David Miller <davem@davemloft.net> wrote:
> From: Pravin Shelar <pshelar@nicira.com>
> Date: Sat, 13 Jul 2013 12:21:52 -0700
>
>> On Sat, Jul 13, 2013 at 10:18 AM, Stephen Hemminger
>> <stephen@networkplumber.org> wrote:
>>> The socket management is now done in workqueue (outside of RTNL)
>>> and protected by vn->sock_lock. There were two possible bugs, first
>>> the vxlan device was removed from the VNI hash table per socket without
>>> holding lock. And there was a race when device is created and the workqueue
>>> could run after deletion.
>>>
>>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>>>
>>> --- a/drivers/net/vxlan.c       2013-07-08 16:31:50.080744429 -0700
>>> +++ b/drivers/net/vxlan.c       2013-07-10 20:15:47.337653899 -0700
>>> @@ -1767,9 +1767,15 @@ static int vxlan_newlink(struct net *net
>>>
>>>  static void vxlan_dellink(struct net_device *dev, struct list_head *head)
>>>  {
>>> +       struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
>>>         struct vxlan_dev *vxlan = netdev_priv(dev);
>>>
>>> +       flush_workqueue(vxlan_wq);
>>> +
>> Doesn't this create dependency on sock_work thread while holding RTNL.
>> If so it can result in deadlock.
>
> What exact deadlock do you perceive?  I don't see any code path in the
> sock_work handler (vxlan_sock_work) which takes the RTNL mutex.
>
> So we should be able to safely flush any pending sock_work jobs from
> vxlan_dellink().  The fact that vxlan_dellink() runs with the RTNL
> mutex shouldn't cause any issues.

commit 1c51a9159ddefa51 (vxlan: fix race caused by dropping
rtnl_unlock) moved sock-create to sock_work workq. Thats why I thought
sock_work takes RTNL. but if it is not taking RTNL in any case, they
why vxlan sock create is deferred to sock_work?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - July 17, 2013, 6:06 a.m.
From: Pravin Shelar <pshelar@nicira.com>
Date: Tue, 16 Jul 2013 13:29:08 -0700

> commit 1c51a9159ddefa51 (vxlan: fix race caused by dropping
> rtnl_unlock) moved sock-create to sock_work workq. Thats why I thought
> sock_work takes RTNL. but if it is not taking RTNL in any case, they
> why vxlan sock create is deferred to sock_work?

That commit is handling the fact that the RTNL mutex is NOT held
during the create operation in question.

It defers to a workqueue so that the new ->sock_lock can be taken
in the appropriate context.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pravin B Shelar - July 17, 2013, 3:41 p.m.
On Tue, Jul 16, 2013 at 11:06 PM, David Miller <davem@davemloft.net> wrote:
> From: Pravin Shelar <pshelar@nicira.com>
> Date: Tue, 16 Jul 2013 13:29:08 -0700
>
>> commit 1c51a9159ddefa51 (vxlan: fix race caused by dropping
>> rtnl_unlock) moved sock-create to sock_work workq. Thats why I thought
>> sock_work takes RTNL. but if it is not taking RTNL in any case, they
>> why vxlan sock create is deferred to sock_work?
>
> That commit is handling the fact that the RTNL mutex is NOT held
> during the create operation in question.
>
> It defers to a workqueue so that the new ->sock_lock can be taken
> in the appropriate context.

ok, Thanks for explanation. I do not see any problem with the patch.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - July 17, 2013, 7:09 p.m.
From: Pravin Shelar <pshelar@nicira.com>
Date: Wed, 17 Jul 2013 08:41:33 -0700

> On Tue, Jul 16, 2013 at 11:06 PM, David Miller <davem@davemloft.net> wrote:
>> From: Pravin Shelar <pshelar@nicira.com>
>> Date: Tue, 16 Jul 2013 13:29:08 -0700
>>
>>> commit 1c51a9159ddefa51 (vxlan: fix race caused by dropping
>>> rtnl_unlock) moved sock-create to sock_work workq. Thats why I thought
>>> sock_work takes RTNL. but if it is not taking RTNL in any case, they
>>> why vxlan sock create is deferred to sock_work?
>>
>> That commit is handling the fact that the RTNL mutex is NOT held
>> during the create operation in question.
>>
>> It defers to a workqueue so that the new ->sock_lock can be taken
>> in the appropriate context.
> 
> ok, Thanks for explanation. I do not see any problem with the patch.

Great, I'm going to apply Stephen's patch then.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - July 17, 2013, 7:51 p.m.
From: Stephen Hemminger <stephen@networkplumber.org>
Date: Sat, 13 Jul 2013 10:18:18 -0700

> The socket management is now done in workqueue (outside of RTNL)
> and protected by vn->sock_lock. There were two possible bugs, first
> the vxlan device was removed from the VNI hash table per socket without
> holding lock. And there was a race when device is created and the workqueue
> could run after deletion.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

--- a/drivers/net/vxlan.c	2013-07-08 16:31:50.080744429 -0700
+++ b/drivers/net/vxlan.c	2013-07-10 20:15:47.337653899 -0700
@@ -1767,9 +1767,15 @@  static int vxlan_newlink(struct net *net
 
 static void vxlan_dellink(struct net_device *dev, struct list_head *head)
 {
+	struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
 	struct vxlan_dev *vxlan = netdev_priv(dev);
 
+	flush_workqueue(vxlan_wq);
+
+	spin_lock(&vn->sock_lock);
 	hlist_del_rcu(&vxlan->hlist);
+	spin_unlock(&vn->sock_lock);
+
 	list_del(&vxlan->next);
 	unregister_netdevice_queue(dev, head);
 }