diff mbox series

[net-next,v2,2/5] net: Revert "ipv4: fix a deadlock in ip_ra_control"

Message ID 152145089432.7718.3981942805167545803.stgit@localhost.localdomain
State Changes Requested, archived
Delegated to: David Miller
Headers show
Series Rework ip_ra_chain protection | expand

Commit Message

Kirill Tkhai March 19, 2018, 9:14 a.m. UTC
This reverts commit 1215e51edad1.
Since raw_close() is used on every RAW socket destruction,
the changes made by 1215e51edad1 scale sadly. This clearly
seen on endless unshare(CLONE_NEWNET) test, and cleanup_net()
kwork spends a lot of time waiting for rtnl_lock() introduced
by this commit.

Next patches in series will rework this in another way,
so now we revert 1215e51edad1. Also, it doesn't seen
mrtsock_destruct() takes sk_lock, and the comment to the commit
does not show the actual stack dump. So, there is a question
did we really need in it.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
---
 net/ipv4/ip_sockglue.c |    1 -
 net/ipv4/ipmr.c        |   11 +++++++++--
 net/ipv4/raw.c         |    2 --
 3 files changed, 9 insertions(+), 5 deletions(-)

Comments

David Miller March 20, 2018, 4:23 p.m. UTC | #1
From: Kirill Tkhai <ktkhai@virtuozzo.com>
Date: Mon, 19 Mar 2018 12:14:54 +0300

> This reverts commit 1215e51edad1.
> Since raw_close() is used on every RAW socket destruction,
> the changes made by 1215e51edad1 scale sadly. This clearly
> seen on endless unshare(CLONE_NEWNET) test, and cleanup_net()
> kwork spends a lot of time waiting for rtnl_lock() introduced
> by this commit.
> 
> Next patches in series will rework this in another way,
> so now we revert 1215e51edad1. Also, it doesn't seen
> mrtsock_destruct() takes sk_lock, and the comment to the commit
> does not show the actual stack dump. So, there is a question
> did we really need in it.
> 
> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>

Kirill, I think the commit you are reverting is legitimate.

The IP_RAW_CONTROL path has an ABBA deadlock with other paths once
you revert this, so you are reintroducing a bug.

All code paths that must take both RTNL and the socket lock must
do them in the same order.  And that order is RTNL then socket
lock.

But you are breaking that here by getting us back into a state
where IP_RAW_CONTROL setsockopt will take the socket lock and
then RTNL.

Again, we can't take, or retake, RTNL if we have the socket lock
currently.

The only valid locking order is socket lock then RTNL.

Thank you.
Kirill Tkhai March 20, 2018, 7:25 p.m. UTC | #2
Hi, David,

thanks for the review!

On 20.03.2018 19:23, David Miller wrote:
> From: Kirill Tkhai <ktkhai@virtuozzo.com>
> Date: Mon, 19 Mar 2018 12:14:54 +0300
> 
>> This reverts commit 1215e51edad1.
>> Since raw_close() is used on every RAW socket destruction,
>> the changes made by 1215e51edad1 scale sadly. This clearly
>> seen on endless unshare(CLONE_NEWNET) test, and cleanup_net()
>> kwork spends a lot of time waiting for rtnl_lock() introduced
>> by this commit.
>>
>> Next patches in series will rework this in another way,
>> so now we revert 1215e51edad1. Also, it doesn't seen
>> mrtsock_destruct() takes sk_lock, and the comment to the commit
>> does not show the actual stack dump. So, there is a question
>> did we really need in it.
>>
>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> 
> Kirill, I think the commit you are reverting is legitimate.
> 
> The IP_RAW_CONTROL path has an ABBA deadlock with other paths once
> you revert this, so you are reintroducing a bug.

The talk is about IP_ROUTER_ALERT, I assume there is just an erratum.
 
> All code paths that must take both RTNL and the socket lock must
> do them in the same order.  And that order is RTNL then socket
> lock.

The place I change in this patch is IP_ROUTER_ALERT. There is only
a call of ip_ra_control(), while this function does not need socket
lock. Please, see next patch. It moves this ip_ra_control() out
of socket lock. And it fixes the problem pointed in reverted patch
in another way. So, if there is ABBA, after next patch it becomes
solved. Does this mean I have to merge [2/5] and [3/5] together?

> But you are breaking that here by getting us back into a state
> where IP_RAW_CONTROL setsockopt will take the socket lock and
> then RTNL.
> 
> Again, we can't take, or retake, RTNL if we have the socket lock
> currently.
> 
> The only valid locking order is socket lock then RTNL.

Thanks,
Kirill
Kirill Tkhai March 20, 2018, 9:50 p.m. UTC | #3
On 20.03.2018 22:25, Kirill Tkhai wrote:
> Hi, David,
> 
> thanks for the review!
> 
> On 20.03.2018 19:23, David Miller wrote:
>> From: Kirill Tkhai <ktkhai@virtuozzo.com>
>> Date: Mon, 19 Mar 2018 12:14:54 +0300
>>
>>> This reverts commit 1215e51edad1.
>>> Since raw_close() is used on every RAW socket destruction,
>>> the changes made by 1215e51edad1 scale sadly. This clearly
>>> seen on endless unshare(CLONE_NEWNET) test, and cleanup_net()
>>> kwork spends a lot of time waiting for rtnl_lock() introduced
>>> by this commit.
>>>
>>> Next patches in series will rework this in another way,
>>> so now we revert 1215e51edad1. Also, it doesn't seen
>>> mrtsock_destruct() takes sk_lock, and the comment to the commit
>>> does not show the actual stack dump. So, there is a question
>>> did we really need in it.
>>>
>>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>>
>> Kirill, I think the commit you are reverting is legitimate.
>>
>> The IP_RAW_CONTROL path has an ABBA deadlock with other paths once
>> you revert this, so you are reintroducing a bug.
> 
> The talk is about IP_ROUTER_ALERT, I assume there is just an erratum.
>  
>> All code paths that must take both RTNL and the socket lock must
>> do them in the same order.  And that order is RTNL then socket
>> lock.
> 
> The place I change in this patch is IP_ROUTER_ALERT. There is only
> a call of ip_ra_control(), while this function does not need socket
> lock. Please, see next patch. It moves this ip_ra_control() out
> of socket lock. And it fixes the problem pointed in reverted patch
> in another way. So, if there is ABBA, after next patch it becomes
> solved. Does this mean I have to merge [2/5] and [3/5] together?

We also can just change the order of patches, and make [3/5] go before [2/5].
Then, the kernel still remains bisectable. How do you think about this?

Thanks,
Kirill

>> But you are breaking that here by getting us back into a state
>> where IP_RAW_CONTROL setsockopt will take the socket lock and
>> then RTNL.
>>
>> Again, we can't take, or retake, RTNL if we have the socket lock
>> currently.
>>
>> The only valid locking order is socket lock then RTNL.
> 
> Thanks,
> Kirill
>
David Miller March 22, 2018, 6:41 p.m. UTC | #4
From: Kirill Tkhai <ktkhai@virtuozzo.com>
Date: Tue, 20 Mar 2018 22:25:35 +0300

> On 20.03.2018 19:23, David Miller wrote:
>> From: Kirill Tkhai <ktkhai@virtuozzo.com>
>> Date: Mon, 19 Mar 2018 12:14:54 +0300
>> 
>>> This reverts commit 1215e51edad1.
>>> Since raw_close() is used on every RAW socket destruction,
>>> the changes made by 1215e51edad1 scale sadly. This clearly
>>> seen on endless unshare(CLONE_NEWNET) test, and cleanup_net()
>>> kwork spends a lot of time waiting for rtnl_lock() introduced
>>> by this commit.
>>>
>>> Next patches in series will rework this in another way,
>>> so now we revert 1215e51edad1. Also, it doesn't seen
>>> mrtsock_destruct() takes sk_lock, and the comment to the commit
>>> does not show the actual stack dump. So, there is a question
>>> did we really need in it.
>>>
>>> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
>> 
>> Kirill, I think the commit you are reverting is legitimate.
>> 
>> The IP_RAW_CONTROL path has an ABBA deadlock with other paths once
>> you revert this, so you are reintroducing a bug.
> 
> The talk is about IP_ROUTER_ALERT, I assume there is just an erratum.

My bad, I did indeed mean IP_ROUTER_ALERT.

>> All code paths that must take both RTNL and the socket lock must
>> do them in the same order.  And that order is RTNL then socket
>> lock.
> 
> The place I change in this patch is IP_ROUTER_ALERT. There is only
> a call of ip_ra_control(), while this function does not need socket
> lock. Please, see next patch. It moves this ip_ra_control() out
> of socket lock. And it fixes the problem pointed in reverted patch
> in another way. So, if there is ABBA, after next patch it becomes
> solved. Does this mean I have to merge [2/5] and [3/5] together?

Yes, that is what should happen, because the revert by itself
reintroduces the potential ABBA deadlock between the socket lock
and the RTNL mutex.

I'll take a look at the new version of your series.

Thank you.
diff mbox series

Patch

diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index be7c3b71914d..b7bac7351409 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -594,7 +594,6 @@  static bool setsockopt_needs_rtnl(int optname)
 	case MCAST_LEAVE_GROUP:
 	case MCAST_LEAVE_SOURCE_GROUP:
 	case MCAST_UNBLOCK_SOURCE:
-	case IP_ROUTER_ALERT:
 		return true;
 	}
 	return false;
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index d752a70855d8..f6be5db16da2 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1399,7 +1399,7 @@  static void mrtsock_destruct(struct sock *sk)
 	struct net *net = sock_net(sk);
 	struct mr_table *mrt;
 
-	ASSERT_RTNL();
+	rtnl_lock();
 	ipmr_for_each_table(mrt, net) {
 		if (sk == rtnl_dereference(mrt->mroute_sk)) {
 			IPV4_DEVCONF_ALL(net, MC_FORWARDING)--;
@@ -1411,6 +1411,7 @@  static void mrtsock_destruct(struct sock *sk)
 			mroute_clean_tables(mrt, false);
 		}
 	}
+	rtnl_unlock();
 }
 
 /* Socket options and virtual interface manipulation. The whole
@@ -1475,8 +1476,13 @@  int ip_mroute_setsockopt(struct sock *sk, int optname, char __user *optval,
 		if (sk != rcu_access_pointer(mrt->mroute_sk)) {
 			ret = -EACCES;
 		} else {
+			/* We need to unlock here because mrtsock_destruct takes
+			 * care of rtnl itself and we can't change that due to
+			 * the IP_ROUTER_ALERT setsockopt which runs without it.
+			 */
+			rtnl_unlock();
 			ret = ip_ra_control(sk, 0, NULL);
-			goto out_unlock;
+			goto out;
 		}
 		break;
 	case MRT_ADD_VIF:
@@ -1588,6 +1594,7 @@  int ip_mroute_setsockopt(struct sock *sk, int optname, char __user *optval,
 	}
 out_unlock:
 	rtnl_unlock();
+out:
 	return ret;
 }
 
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 54648d20bf0f..720bef7da2f6 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -711,9 +711,7 @@  static void raw_close(struct sock *sk, long timeout)
 	/*
 	 * Raw sockets may have direct kernel references. Kill them.
 	 */
-	rtnl_lock();
 	ip_ra_control(sk, 0, NULL);
-	rtnl_unlock();
 
 	sk_common_release(sk);
 }