Patchwork SCTP: fix race between sctp_bind_addr_free() and sctp_bind_addr_conflict()

login
register
mail settings
Submitter Wei Yongjun
Date May 18, 2011, 9:02 a.m.
Message ID <4DD38B30.9090601@cn.fujitsu.com>
Download mbox | patch
Permalink /patch/96138/
State Superseded
Delegated to: David Miller
Headers show

Comments

Wei Yongjun - May 18, 2011, 9:02 a.m.
> Le mercredi 18 mai 2011 à 10:06 +0200, Jacek Luczak a écrit :
>> 2011/5/18 Eric Dumazet <eric.dumazet@gmail.com>:
>>> If you're removing items from this list, you must be a writer here, with
>>> exclusive access. So rcu_read_lock()/rcu_read_unlock() is not necessary.
>> I could agree to some extend ... but strict RCU section IMO is needed here.
>> I can check this if the issue exists.
>>
> I can tell you for sure rcu_read_lock() is not needed here. Its only
> showing confusion from code's author.
>
> Please read Documentation/RCU/listRCU.txt for concise explanations,
> line 117.
>
>
>>> Therefore, I guess following code is better :
>>>
>>> list_for_each_entry(addr, &bp->address_list, list) {
>>>        list_del_rcu(&addr->list);
>>>        call_rcu(&addr->rcu, sctp_local_addr_free);
>>>        SCTP_DBG_OBJCNT_DEC(addr);
>>> }
>>>
>>> Then, why dont you fix sctp_bind_addr_clean() instead ?
>>>
>>> if 'struct sctp_sockaddr_entry' is recu protected, then all frees should
>>> be protected as well.
>> The _clean() as claimed by Vlad is called many times from various places
>> in code and this could give a overhead. I guess Vlad would need to comment.
> I guess a full review of this code is needed. You'll have to prove
> sctp_bind_addr_clean() is always called after one RCU grace period if
> you want to leave it as is.
>
> You cant get RCU for free, some rules must be followed or you risk
> crashes.
>

fix the race between sctp_bind_addr_free() and sctp_bind_addr_conflict(), maybe you just
need to remove the socket from the port hash before empty the bind address list.
some thing like this, not test.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jacek Luczak - May 18, 2011, 11:01 a.m.
2011/5/18 Wei Yongjun <yjwei@cn.fujitsu.com>:
>
>> Le mercredi 18 mai 2011 à 10:06 +0200, Jacek Luczak a écrit :
>>> 2011/5/18 Eric Dumazet <eric.dumazet@gmail.com>:
>>>> If you're removing items from this list, you must be a writer here, with
>>>> exclusive access. So rcu_read_lock()/rcu_read_unlock() is not necessary.
>>> I could agree to some extend ... but strict RCU section IMO is needed here.
>>> I can check this if the issue exists.
>>>
>> I can tell you for sure rcu_read_lock() is not needed here. Its only
>> showing confusion from code's author.
>>
>> Please read Documentation/RCU/listRCU.txt for concise explanations,
>> line 117.
>>
>>
>>>> Therefore, I guess following code is better :
>>>>
>>>> list_for_each_entry(addr, &bp->address_list, list) {
>>>>        list_del_rcu(&addr->list);
>>>>        call_rcu(&addr->rcu, sctp_local_addr_free);
>>>>        SCTP_DBG_OBJCNT_DEC(addr);
>>>> }
>>>>
>>>> Then, why dont you fix sctp_bind_addr_clean() instead ?
>>>>
>>>> if 'struct sctp_sockaddr_entry' is recu protected, then all frees should
>>>> be protected as well.
>>> The _clean() as claimed by Vlad is called many times from various places
>>> in code and this could give a overhead. I guess Vlad would need to comment.
>> I guess a full review of this code is needed. You'll have to prove
>> sctp_bind_addr_clean() is always called after one RCU grace period if
>> you want to leave it as is.
>>
>> You cant get RCU for free, some rules must be followed or you risk
>> crashes.
>>
>
> fix the race between sctp_bind_addr_free() and sctp_bind_addr_conflict(), maybe you just
> need to remove the socket from the port hash before empty the bind address list.
> some thing like this, not test.
>
> diff --git a/net/sctp/endpointola.c b/net/sctp/endpointola.c
> index c8cc24e..924d846 100644
> --- a/net/sctp/endpointola.c
> +++ b/net/sctp/endpointola.c
> @@ -268,12 +268,13 @@ static void sctp_endpoint_destroy(struct sctp_endpoint *ep)
>
>        /* Cleanup. */
>        sctp_inq_free(&ep->base.inqueue);
> -       sctp_bind_addr_free(&ep->base.bind_addr);
>
>        /* Remove and free the port */
>        if (sctp_sk(ep->base.sk)->bind_hash)
>                sctp_put_port(ep->base.sk);
>
> +       sctp_bind_addr_free(&ep->base.bind_addr);
> +
>        /* Give up our hold on the sock. */
>        if (ep->base.sk)
>                sock_put(ep->base.sk);
>

Done tests with that one and looks like it survive the flood.

How then this will be handled is up to you. Both ways fix
the issue which makes me happy as damn hell.

@Eric, if you will take a look into the code you might notice
that there are few places where list operations could be
optimised and the main question here is do we really care
to have the data ,,safe'' so that things like that won't popup.
The good example can be the set of _local_ functions.
Ahhh... and I'm aware of how tricky can be abuse of RCU.

-Jacek
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - May 18, 2011, 11:41 a.m.
Le mercredi 18 mai 2011 à 13:01 +0200, Jacek Luczak a écrit :

> @Eric, if you will take a look into the code you might notice
> that there are few places where list operations could be
> optimised and the main question here is do we really care
> to have the data ,,safe'' so that things like that won't popup.
> The good example can be the set of _local_ functions.
> Ahhh... and I'm aware of how tricky can be abuse of RCU.

I took a quick look at existing rcu_read_lock() uses in sctp and did not
find other problems [or optimizations if you prefer this point of view].
Please elaborate.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jacek Luczak - May 18, 2011, 11:58 a.m.
2011/5/18 Eric Dumazet <eric.dumazet@gmail.com>:
> Le mercredi 18 mai 2011 à 13:01 +0200, Jacek Luczak a écrit :
>
>> @Eric, if you will take a look into the code you might notice
>> that there are few places where list operations could be
>> optimised and the main question here is do we really care
>> to have the data ,,safe'' so that things like that won't popup.
>> The good example can be the set of _local_ functions.

There should be a vertical space here ...

>> Ahhh... and I'm aware of how tricky can be abuse of RCU.
>
> I took a quick look at existing rcu_read_lock() uses in sctp and did not
> find other problems [or optimizations if you prefer this point of view].
> Please elaborate.

... so that the last line does not have any connection with what I've
wrote above.

I get your point Eric so you don't really have to prove it - if you still need.

-jacek
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vlad Yasevich - May 18, 2011, 12:33 p.m.
On 05/18/2011 05:02 AM, Wei Yongjun wrote:
 
> fix the race between sctp_bind_addr_free() and sctp_bind_addr_conflict(), maybe you just
> need to remove the socket from the port hash before empty the bind address list.
> some thing like this, not test.
> 
> diff --git a/net/sctp/endpointola.c b/net/sctp/endpointola.c
> index c8cc24e..924d846 100644
> --- a/net/sctp/endpointola.c
> +++ b/net/sctp/endpointola.c
> @@ -268,12 +268,13 @@ static void sctp_endpoint_destroy(struct sctp_endpoint *ep)
>  
>  	/* Cleanup. */
>  	sctp_inq_free(&ep->base.inqueue);
> -	sctp_bind_addr_free(&ep->base.bind_addr);
>  
>  	/* Remove and free the port */
>  	if (sctp_sk(ep->base.sk)->bind_hash)
>  		sctp_put_port(ep->base.sk);
>  
> +	sctp_bind_addr_free(&ep->base.bind_addr);
> +
>  	/* Give up our hold on the sock. */
>  	if (ep->base.sk)
>  		sock_put(ep->base.sk);
> 
> 

I am not sure that this will guarantee avoidance of this crash, even though it is the right
thing to do in general.

We simply make the race condition much smaller and much harder to hit.  Potentially, one
cpu may be doing lookup of the socket while the other is doing the destroy.  If the lookup cpu
finds the socket just as this code removes the socket from the hash, we still have this potential
race condition.

I agree with Eric, rcu_read_lock() is not strictly necessary, as what we are really after is call_rcu()
based destruction.  We need to delay memory destruction for the rcu grace period.

Thinking a little more about how bind_addr_clean() is used, it would probably benefit from getting
converted to call_rcu().  That function is used as local clean-up in case of malloc failure; however,
that doesn't preclude a potential race.  The fact that we do not have this race simply points out that
we don't have any kind of parallel lookup that can hit it (and the code confirms it).  This doesn't
mean that we wouldn't have it in the future.

-vlad
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/net/sctp/endpointola.c b/net/sctp/endpointola.c
index c8cc24e..924d846 100644
--- a/net/sctp/endpointola.c
+++ b/net/sctp/endpointola.c
@@ -268,12 +268,13 @@  static void sctp_endpoint_destroy(struct sctp_endpoint *ep)
 
 	/* Cleanup. */
 	sctp_inq_free(&ep->base.inqueue);
-	sctp_bind_addr_free(&ep->base.bind_addr);
 
 	/* Remove and free the port */
 	if (sctp_sk(ep->base.sk)->bind_hash)
 		sctp_put_port(ep->base.sk);
 
+	sctp_bind_addr_free(&ep->base.bind_addr);
+
 	/* Give up our hold on the sock. */
 	if (ep->base.sk)
 		sock_put(ep->base.sk);