diff mbox

net: Fix sock_wfree() race

Message ID 4ABA262F.9040407@gmail.com
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Sept. 23, 2009, 1:44 p.m. UTC
David Miller a écrit :
> From: David Miller <davem@davemloft.net>
> Date: Fri, 11 Sep 2009 11:43:37 -0700 (PDT)
> 
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Date: Wed, 09 Sep 2009 00:49:31 +0200
>>
>>> [PATCH] net: Fix sock_wfree() race
>>>
>>> Commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
>>> (net: No more expensive sock_hold()/sock_put() on each tx)
>>> opens a window in sock_wfree() where another cpu
>>> might free the socket we are working on.
>>>
>>> Fix is to call sk->sk_write_space(sk) only
>>> while still holding a reference on sk.
>>>
>>> Since doing this call is done before the 
>>> atomic_sub(truesize, &sk->sk_wmem_alloc), we should pass truesize as 
>>> a bias for possible sk_wmem_alloc evaluations.
>>>
>>> Reported-by: Jike Song <albcamus@gmail.com>
>>> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
>> Applied to net-next-2.6, thanks.  I'll queue up your simpler
>> version for -stable.
> 
> Eric, I have to revert, as you didn't update the callbacks
> of several protocols such as SCTP and RDS in this change.
> 
> Let me know when you have a fixed version of this patch :-)

Sorry for the delay David. But this is complex. I am not
sure we can do a clean and safe thing, not counting
the added bloat.

If we do :

void sock_wfree(struct sk_buff *skb)
{
        struct sock *sk = skb->sk;
        int res;

        if (!sock_flag(sk, SOCK_USE_WRITE_QUEUE))
                sk->sk_write_space(sk, skb->truesize);

        res = atomic_sub_return(skb->truesize, &sk->sk_wmem_alloc);
        /*
         * if sk_wmem_alloc reached 0, we are last user and should
         * free this sock, as sk_free() call could not do it.
         */
        if (res == 0)
                __sk_free(sk);
}


There is still a possibility multiple cpus call sock_wfree()
for the same socket, and that they all call sk_write_space()
with their bias, yet the protocol still has a possible too
big estimation of sk_wmem_alloc

We could miss to wakeup a blocked writer in case low sk->sk_sndbuf
values are setup. (One could argue that with small sk_sndbuf
values we should not have many packets in flight : Keep in mind
sk_sndbuf can be lowered by the user)


With second patch we instead have :

void sock_wfree(struct sk_buff *skb)
{
	struct sock *sk = skb->sk;
	unsigned int len = skb->truesize;

	if (!sock_flag(sk, SOCK_USE_WRITE_QUEUE)) {
		/*
		 * Keep a reference on sk_wmem_alloc, this will be released
		 * after sk_write_space() call
		 */
		atomic_sub(len - 1, &sk->sk_wmem_alloc);
		sk->sk_write_space(sk);
		len = 1;
	}
	/*
	 * if sk_wmem_alloc reaches 0, we must finish what sk_free()
	 * could not do because of in-flight packets
	 */
	if (atomic_sub_return(len, &sk->sk_wmem_alloc) == 0)
		__sk_free(sk);
}

The accumulated transient error on sk_wmem_alloc is then < num_online_cpus(),
that should be OK even for very small sk_sndbuf values.

Of course TCP doesnt have to pay the price of sk_write_space() and the second
atomic operation re-added by this fix.

Here is the patch for reference :

[PATCH] net: Fix sock_wfree() race

Commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
(net: No more expensive sock_hold()/sock_put() on each tx)
opens a window in sock_wfree() where another cpu
might free the socket we are working on.

A fix is to call sk->sk_write_space(sk) while still
holding a reference on sk.


Reported-by: Jike Song <albcamus@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/core/sock.c |   19 ++++++++++++-------
 1 files changed, 12 insertions(+), 7 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Jarek Poplawski Sept. 24, 2009, 8:07 p.m. UTC | #1
Eric Dumazet wrote, On 09/23/2009 03:44 PM:

...
> Here is the patch for reference :
> 
> [PATCH] net: Fix sock_wfree() race
> 
> Commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
> (net: No more expensive sock_hold()/sock_put() on each tx)
> opens a window in sock_wfree() where another cpu
> might free the socket we are working on.
> 
> A fix is to call sk->sk_write_space(sk) while still
> holding a reference on sk.
> 
> 
> Reported-by: Jike Song <albcamus@gmail.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
>  net/core/sock.c |   19 ++++++++++++-------
>  1 files changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 30d5446..e1f034e 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -1228,17 +1228,22 @@ void __init sk_init(void)
>  void sock_wfree(struct sk_buff *skb)
>  {
>  	struct sock *sk = skb->sk;
> -	int res;
> +	unsigned int len = skb->truesize;
>  
> -	/* In case it might be waiting for more memory. */
> -	res = atomic_sub_return(skb->truesize, &sk->sk_wmem_alloc);
> -	if (!sock_flag(sk, SOCK_USE_WRITE_QUEUE))
> +	if (!sock_flag(sk, SOCK_USE_WRITE_QUEUE)) {
> +		/*
> +		 * Keep a reference on sk_wmem_alloc, this will be released
> +		 * after sk_write_space() call
> +		 */
> +		atomic_sub(len - 1, &sk->sk_wmem_alloc);
>  		sk->sk_write_space(sk);
> +		len = 1;
> +	}
>  	/*
> -	 * if sk_wmem_alloc reached 0, we are last user and should
> -	 * free this sock, as sk_free() call could not do it.
> +	 * if sk_wmem_alloc reaches 0, we must finish what sk_free()
> +	 * could not do because of in-flight packets
>  	 */
> -	if (res == 0)
> +	if (atomic_sub_return(len, &sk->sk_wmem_alloc) == 0)
>  		__sk_free(sk);


Probably atomic_sub_and_test() is more popular for this.

Jarek P.

>  }
>  EXPORT_SYMBOL(sock_wfree);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/core/sock.c b/net/core/sock.c
index 30d5446..e1f034e 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1228,17 +1228,22 @@  void __init sk_init(void)
 void sock_wfree(struct sk_buff *skb)
 {
 	struct sock *sk = skb->sk;
-	int res;
+	unsigned int len = skb->truesize;
 
-	/* In case it might be waiting for more memory. */
-	res = atomic_sub_return(skb->truesize, &sk->sk_wmem_alloc);
-	if (!sock_flag(sk, SOCK_USE_WRITE_QUEUE))
+	if (!sock_flag(sk, SOCK_USE_WRITE_QUEUE)) {
+		/*
+		 * Keep a reference on sk_wmem_alloc, this will be released
+		 * after sk_write_space() call
+		 */
+		atomic_sub(len - 1, &sk->sk_wmem_alloc);
 		sk->sk_write_space(sk);
+		len = 1;
+	}
 	/*
-	 * if sk_wmem_alloc reached 0, we are last user and should
-	 * free this sock, as sk_free() call could not do it.
+	 * if sk_wmem_alloc reaches 0, we must finish what sk_free()
+	 * could not do because of in-flight packets
 	 */
-	if (res == 0)
+	if (atomic_sub_return(len, &sk->sk_wmem_alloc) == 0)
 		__sk_free(sk);
 }
 EXPORT_SYMBOL(sock_wfree);