diff mbox

[1/7] xfrm: remove policy lock when accessing policy->walk.dead

Message ID 1269871964-5412-2-git-send-email-timo.teras@iki.fi
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Timo Teras March 29, 2010, 2:12 p.m. UTC
All of the code considers ->dead as a hint that the cached policy
needs to get refreshed. The read side can just drop the read lock
without any side effects.

The write side needs to make sure that it's written only exactly
once. Only possible race is at xfrm_policy_kill(). This is fixed
by checking result of __xfrm_policy_unlink() when needed. It will
always succeed if the policy object is looked up from the hash
list (so some checks are removed), but it needs to be checked if
we are trying to unlink policy via a reference (appropriate
checks added).

Since policy->walk.dead is written exactly once, it no longer
needs to be protected with a write lock.

Signed-off-by: Timo Teras <timo.teras@iki.fi>
---
 net/xfrm/xfrm_policy.c |   30 +++++++-----------------------
 net/xfrm/xfrm_user.c   |    6 +-----
 2 files changed, 8 insertions(+), 28 deletions(-)

Comments

Herbert Xu March 29, 2010, 2:43 p.m. UTC | #1
On Mon, Mar 29, 2010 at 05:12:38PM +0300, Timo Teras wrote:
>
> @@ -1132,7 +1119,7 @@ int xfrm_sk_policy_insert(struct sock *sk, int dir, struct xfrm_policy *pol)
>  		__xfrm_policy_link(pol, XFRM_POLICY_MAX+dir);
>  	}
>  	if (old_pol)
> -		__xfrm_policy_unlink(old_pol, XFRM_POLICY_MAX+dir);
> +		old_pol = __xfrm_policy_unlink(old_pol, XFRM_POLICY_MAX+dir);
>  	write_unlock_bh(&xfrm_policy_lock);
>  
>  	if (old_pol) {

So when can this actually fail?

Otherwise this patch looks good to me.

Thanks,
Timo Teras March 30, 2010, 4:55 a.m. UTC | #2
Herbert Xu wrote:
> On Mon, Mar 29, 2010 at 05:12:38PM +0300, Timo Teras wrote:
>> @@ -1132,7 +1119,7 @@ int xfrm_sk_policy_insert(struct sock *sk, int dir, struct xfrm_policy *pol)
>>  		__xfrm_policy_link(pol, XFRM_POLICY_MAX+dir);
>>  	}
>>  	if (old_pol)
>> -		__xfrm_policy_unlink(old_pol, XFRM_POLICY_MAX+dir);
>> +		old_pol = __xfrm_policy_unlink(old_pol, XFRM_POLICY_MAX+dir);
>>  	write_unlock_bh(&xfrm_policy_lock);
>>  
>>  	if (old_pol) {
> 
> So when can this actually fail?

Considering that the socket reference is received from the sk->sk_policy,
and the hash bucket we use is "XFRM_POLICY_MAX+dir", it's non-obvious if
it can fail or not.

It would look like the timer can kill a policy and unlink it, but it
would still be found from sk_policy.

It probably doesn't really make sense to insert per-socket policy that
expires. But in case someone does something like that, I'd think we
need the above just to be sure.

Considering this, xfrm_sk_policy_lookup() should probably check the
dead flag, and cleanup sk_policy if it was killed by a timer.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu March 30, 2010, 11:53 a.m. UTC | #3
On Tue, Mar 30, 2010 at 07:55:07AM +0300, Timo Teräs wrote:
> Herbert Xu wrote:
>> On Mon, Mar 29, 2010 at 05:12:38PM +0300, Timo Teras wrote:
>>> @@ -1132,7 +1119,7 @@ int xfrm_sk_policy_insert(struct sock *sk, int dir, struct xfrm_policy *pol)
>>>  		__xfrm_policy_link(pol, XFRM_POLICY_MAX+dir);
>>>  	}
>>>  	if (old_pol)
>>> -		__xfrm_policy_unlink(old_pol, XFRM_POLICY_MAX+dir);
>>> +		old_pol = __xfrm_policy_unlink(old_pol, XFRM_POLICY_MAX+dir);
>>>  	write_unlock_bh(&xfrm_policy_lock);
>>>   	if (old_pol) {
>>
>> So when can this actually fail?
>
> Considering that the socket reference is received from the sk->sk_policy,
> and the hash bucket we use is "XFRM_POLICY_MAX+dir", it's non-obvious if
> it can fail or not.
>
> It would look like the timer can kill a policy and unlink it, but it
> would still be found from sk_policy.

Socket policies cannot expire.

In fact, they probably shouldn't even be on the bydst or any other
hash table.  I think the only reason they're there at all is because
the hash table was added to __xfrm_policy_link which happens to be
used by socket policies.

Cheers,
Timo Teras March 30, 2010, 12:04 p.m. UTC | #4
Herbert Xu wrote:
> On Tue, Mar 30, 2010 at 07:55:07AM +0300, Timo Teräs wrote:
>> Herbert Xu wrote:
>>> On Mon, Mar 29, 2010 at 05:12:38PM +0300, Timo Teras wrote:
>>>> @@ -1132,7 +1119,7 @@ int xfrm_sk_policy_insert(struct sock *sk, int dir, struct xfrm_policy *pol)
>>>>  		__xfrm_policy_link(pol, XFRM_POLICY_MAX+dir);
>>>>  	}
>>>>  	if (old_pol)
>>>> -		__xfrm_policy_unlink(old_pol, XFRM_POLICY_MAX+dir);
>>>> +		old_pol = __xfrm_policy_unlink(old_pol, XFRM_POLICY_MAX+dir);
>>>>  	write_unlock_bh(&xfrm_policy_lock);
>>>>   	if (old_pol) {
>>> So when can this actually fail?
>> Considering that the socket reference is received from the sk->sk_policy,
>> and the hash bucket we use is "XFRM_POLICY_MAX+dir", it's non-obvious if
>> it can fail or not.
>>
>> It would look like the timer can kill a policy and unlink it, but it
>> would still be found from sk_policy.
> 
> Socket policies cannot expire.

Was not aware of that. The above is not needed then.

> In fact, they probably shouldn't even be on the bydst or any other
> hash table.  I think the only reason they're there at all is because
> the hash table was added to __xfrm_policy_link which happens to be
> used by socket policies.

I think it's hashed so socket policies are included in the policy
db dumps and counts.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu March 30, 2010, 12:14 p.m. UTC | #5
On Tue, Mar 30, 2010 at 03:04:01PM +0300, Timo Teräs wrote:
>> I think it's hashed so socket policies are included in the policy
> db dumps and counts.

No we use a linked list for that.

Cheers,
Timo Teras March 30, 2010, 12:21 p.m. UTC | #6
Herbert Xu wrote:
> On Tue, Mar 30, 2010 at 03:04:01PM +0300, Timo Teräs wrote:
>>> I think it's hashed so socket policies are included in the policy
>> db dumps and counts.
> 
> No we use a linked list for that.

Ah, yes. It's just all done together in the __xfrm_link_policy().

Hmm... is it possible to modify/delete per-socket policies from
userland via xfrm_user? That would be also another race why
we'd need to check the unlinking result.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu March 30, 2010, 12:23 p.m. UTC | #7
On Tue, Mar 30, 2010 at 03:21:19PM +0300, Timo Teräs wrote:
>
> Hmm... is it possible to modify/delete per-socket policies from
> userland via xfrm_user? That would be also another race why
> we'd need to check the unlinking result.

How would the user be able to specify the socket?

The answer is no.

Cheers,
Timo Teras March 30, 2010, 12:41 p.m. UTC | #8
Herbert Xu wrote:
> On Tue, Mar 30, 2010 at 03:21:19PM +0300, Timo Teräs wrote:
>> Hmm... is it possible to modify/delete per-socket policies from
>> userland via xfrm_user? That would be also another race why
>> we'd need to check the unlinking result.
> 
> How would the user be able to specify the socket?
> 
> The answer is no.

I thought it was possible since they can be dumped.

But yes, I now see that the policy direction check does not
allow the per-socket policies to be specified.

So it'd make more sense to nuke the hashes entirely for
per-socket policies?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu March 30, 2010, 12:48 p.m. UTC | #9
On Tue, Mar 30, 2010 at 03:41:02PM +0300, Timo Teräs wrote:
>
> So it'd make more sense to nuke the hashes entirely for
> per-socket policies?

Absolutely.
Timo Teras March 30, 2010, 1:33 p.m. UTC | #10
Herbert Xu wrote:
> On Tue, Mar 30, 2010 at 03:41:02PM +0300, Timo Teräs wrote:
>> So it'd make more sense to nuke the hashes entirely for
>> per-socket policies?
> 
> Absolutely.

I checked now the xfrm_user, and mostly it seems to prevent
modification to per-socket policies.

The only exception is XFRM_MSG_POLEXPIRE handler
xfrm_add_pol_expire(). It calls xfrm_policy_byid() without
verifying the direction, and can thus complete successfully on
a per-socket policy. This can actually result in per-socket
policy deletion via netlink.

I guess the proper thing is to add the direction check there.

It also seems that the by-index hash is also used when
generating new index. It's to double check that the index
is unique. So deleting the by-index hash from per-socket
policies seems tricky.

Removing bydst hashing should be trivial.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Timo Teras March 30, 2010, 2:01 p.m. UTC | #11
Timo Teräs wrote:
> Herbert Xu wrote:
>> On Tue, Mar 30, 2010 at 07:55:07AM +0300, Timo Teräs wrote:
>>> Herbert Xu wrote:
>>>> On Mon, Mar 29, 2010 at 05:12:38PM +0300, Timo Teras wrote:
>>>>> @@ -1132,7 +1119,7 @@ int xfrm_sk_policy_insert(struct sock *sk, 
>>>>> int dir, struct xfrm_policy *pol)
>>>>>          __xfrm_policy_link(pol, XFRM_POLICY_MAX+dir);
>>>>>      }
>>>>>      if (old_pol)
>>>>> -        __xfrm_policy_unlink(old_pol, XFRM_POLICY_MAX+dir);
>>>>> +        old_pol = __xfrm_policy_unlink(old_pol, XFRM_POLICY_MAX+dir);
>>>>>      write_unlock_bh(&xfrm_policy_lock);
>>>>>       if (old_pol) {
>>>> So when can this actually fail?
>>> Considering that the socket reference is received from the 
>>> sk->sk_policy,
>>> and the hash bucket we use is "XFRM_POLICY_MAX+dir", it's non-obvious if
>>> it can fail or not.
>>>
>>> It would look like the timer can kill a policy and unlink it, but it
>>> would still be found from sk_policy.
>>
>> Socket policies cannot expire.
> 
> Was not aware of that. The above is not needed then.

Since the exported function xfrm_policy_byid() can result in deletion
of socket policy, it's safer to leave this change in. This is can be
even triggered via xfrm_user since it does not check 'dir' for the
policy expired message it handles. Any custom module could do similar
harm.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu March 30, 2010, 2:29 p.m. UTC | #12
On Tue, Mar 30, 2010 at 05:01:47PM +0300, Timo Teräs wrote:
>
> Since the exported function xfrm_policy_byid() can result in deletion
> of socket policy, it's safer to leave this change in. This is can be
> even triggered via xfrm_user since it does not check 'dir' for the
> policy expired message it handles. Any custom module could do similar
> harm.

That's a bug.  Socket policies should never be deleted by anyone
other than the socket owner through a setsockopt.

Cheers,
Herbert Xu March 30, 2010, 2:30 p.m. UTC | #13
On Tue, Mar 30, 2010 at 04:33:52PM +0300, Timo Teräs wrote:
> Herbert Xu wrote:
>> On Tue, Mar 30, 2010 at 03:41:02PM +0300, Timo Teräs wrote:
>>> So it'd make more sense to nuke the hashes entirely for
>>> per-socket policies?
>>
>> Absolutely.
>
> I checked now the xfrm_user, and mostly it seems to prevent
> modification to per-socket policies.
>
> The only exception is XFRM_MSG_POLEXPIRE handler
> xfrm_add_pol_expire(). It calls xfrm_policy_byid() without
> verifying the direction, and can thus complete successfully on
> a per-socket policy. This can actually result in per-socket
> policy deletion via netlink.

That shouldn't be possible since the directions used by socket
policies cannot be set through xfrm_user.

Cheers,
Herbert Xu March 30, 2010, 2:34 p.m. UTC | #14
On Tue, Mar 30, 2010 at 10:30:48PM +0800, Herbert Xu wrote:
> On Tue, Mar 30, 2010 at 04:33:52PM +0300, Timo Teräs wrote:
> > Herbert Xu wrote:
> >> On Tue, Mar 30, 2010 at 03:41:02PM +0300, Timo Teräs wrote:
> >>> So it'd make more sense to nuke the hashes entirely for
> >>> per-socket policies?
> >>
> >> Absolutely.
> >
> > I checked now the xfrm_user, and mostly it seems to prevent
> > modification to per-socket policies.
> >
> > The only exception is XFRM_MSG_POLEXPIRE handler
> > xfrm_add_pol_expire(). It calls xfrm_policy_byid() without
> > verifying the direction, and can thus complete successfully on
> > a per-socket policy. This can actually result in per-socket
> > policy deletion via netlink.
> 
> That shouldn't be possible since the directions used by socket
> policies cannot be set through xfrm_user.

If there is a missing direction check then we should add it.
Herbert Xu March 30, 2010, 2:37 p.m. UTC | #15
On Tue, Mar 30, 2010 at 04:33:52PM +0300, Timo Teräs wrote:
>
> It also seems that the by-index hash is also used when
> generating new index. It's to double check that the index
> is unique. So deleting the by-index hash from per-socket
> policies seems tricky.
>
> Removing bydst hashing should be trivial.

Yes that makes sense.
Timo Teras March 30, 2010, 3:36 p.m. UTC | #16
Herbert Xu wrote:
> On Tue, Mar 30, 2010 at 05:01:47PM +0300, Timo Teräs wrote:
>> Since the exported function xfrm_policy_byid() can result in deletion
>> of socket policy, it's safer to leave this change in. This is can be
>> even triggered via xfrm_user since it does not check 'dir' for the
>> policy expired message it handles. Any custom module could do similar
>> harm.
> 
> That's a bug.  Socket policies should never be deleted by anyone
> other than the socket owner through a setsockopt.

Ok. I can remove the change to xfrm_sk_policy_insert() if you want.

I only added it because it's non-trivial to figure out if there's
any code path that could race. It's a great help for reader of the
code to see that it's correct even if it's not strictly needed.
My initial feeling is that the change produces better code as the
original 'old_pol' does not have to get stored and restored.

I'm fine either way.

I will also include a patch to fix the missing 'dir' check in
xfrm_user.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu March 31, 2010, 12:43 a.m. UTC | #17
On Tue, Mar 30, 2010 at 06:36:37PM +0300, Timo Teräs wrote:
>
> I only added it because it's non-trivial to figure out if there's
> any code path that could race. It's a great help for reader of the
> code to see that it's correct even if it's not strictly needed.

No that's bad because you're misleading people into thinking
something that isn't allowed can happen.  It's much better to
add a comment instead.

> I will also include a patch to fix the missing 'dir' check in
> xfrm_user.

Thanks,
diff mbox

Patch

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 843e066..595d347 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -156,7 +156,7 @@  static void xfrm_policy_timer(unsigned long data)
 
 	read_lock(&xp->lock);
 
-	if (xp->walk.dead)
+	if (unlikely(xp->walk.dead))
 		goto out;
 
 	dir = xfrm_policy_id2dir(xp->index);
@@ -297,17 +297,7 @@  static DECLARE_WORK(xfrm_policy_gc_work, xfrm_policy_gc_task);
 
 static void xfrm_policy_kill(struct xfrm_policy *policy)
 {
-	int dead;
-
-	write_lock_bh(&policy->lock);
-	dead = policy->walk.dead;
 	policy->walk.dead = 1;
-	write_unlock_bh(&policy->lock);
-
-	if (unlikely(dead)) {
-		WARN_ON(1);
-		return;
-	}
 
 	spin_lock_bh(&xfrm_policy_gc_lock);
 	hlist_add_head(&policy->bydst, &xfrm_policy_gc_list);
@@ -776,7 +766,6 @@  xfrm_policy_flush_secctx_check(struct net *net, u8 type, struct xfrm_audit *audi
 int xfrm_policy_flush(struct net *net, u8 type, struct xfrm_audit *audit_info)
 {
 	int dir, err = 0, cnt = 0;
-	struct xfrm_policy *dp;
 
 	write_lock_bh(&xfrm_policy_lock);
 
@@ -794,10 +783,9 @@  int xfrm_policy_flush(struct net *net, u8 type, struct xfrm_audit *audit_info)
 				     &net->xfrm.policy_inexact[dir], bydst) {
 			if (pol->type != type)
 				continue;
-			dp = __xfrm_policy_unlink(pol, dir);
+			__xfrm_policy_unlink(pol, dir);
 			write_unlock_bh(&xfrm_policy_lock);
-			if (dp)
-				cnt++;
+			cnt++;
 
 			xfrm_audit_policy_delete(pol, 1, audit_info->loginuid,
 						 audit_info->sessionid,
@@ -816,10 +804,9 @@  int xfrm_policy_flush(struct net *net, u8 type, struct xfrm_audit *audit_info)
 					     bydst) {
 				if (pol->type != type)
 					continue;
-				dp = __xfrm_policy_unlink(pol, dir);
+				__xfrm_policy_unlink(pol, dir);
 				write_unlock_bh(&xfrm_policy_lock);
-				if (dp)
-					cnt++;
+				cnt++;
 
 				xfrm_audit_policy_delete(pol, 1,
 							 audit_info->loginuid,
@@ -1132,7 +1119,7 @@  int xfrm_sk_policy_insert(struct sock *sk, int dir, struct xfrm_policy *pol)
 		__xfrm_policy_link(pol, XFRM_POLICY_MAX+dir);
 	}
 	if (old_pol)
-		__xfrm_policy_unlink(old_pol, XFRM_POLICY_MAX+dir);
+		old_pol = __xfrm_policy_unlink(old_pol, XFRM_POLICY_MAX+dir);
 	write_unlock_bh(&xfrm_policy_lock);
 
 	if (old_pol) {
@@ -1737,11 +1724,8 @@  restart:
 			goto error;
 		}
 
-		for (pi = 0; pi < npols; pi++) {
-			read_lock_bh(&pols[pi]->lock);
+		for (pi = 0; pi < npols; pi++)
 			pol_dead |= pols[pi]->walk.dead;
-			read_unlock_bh(&pols[pi]->lock);
-		}
 
 		write_lock_bh(&policy->lock);
 		if (unlikely(pol_dead || stale_bundle(dst))) {
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 6106b72..778a6ec 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -1766,13 +1766,9 @@  static int xfrm_add_pol_expire(struct sk_buff *skb, struct nlmsghdr *nlh,
 	if (xp == NULL)
 		return -ENOENT;
 
-	read_lock(&xp->lock);
-	if (xp->walk.dead) {
-		read_unlock(&xp->lock);
+	if (unlikely(xp->walk.dead))
 		goto out;
-	}
 
-	read_unlock(&xp->lock);
 	err = 0;
 	if (up->hard) {
 		uid_t loginuid = NETLINK_CB(skb).loginuid;