diff mbox

[Bugme-new,Bug,13760] New: 2.6.30 kernel locks up with pppoe in back trace (regression)

Message ID 4A6EBA88.8030205@cosmosbay.com
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet July 28, 2009, 8:44 a.m. UTC
Igor M Podlesny a écrit :
> [...]
>> Could have been a problem in net core, perhaps.
>>
>> Below is a ppp fix from 2.6.31, but it seems unlikely to fix your problem.
>>
>> It would help if we could see that trace, please.  A digital photo
>> would suit.
> 
> 	Here it is:
> 
> 		http://bugzilla.kernel.org/attachment.cgi?id=22516
> 
> 	(It's 2.6.30.3)
> 	

Looking at this, I believe net_assign_generic() is not safe.

Two cpus could try to expand/update the array at same time, one update could be lost.

register_pernet_gen_device() has a mutex to guard against concurrent
calls, but net_assign_generic() has no locking at all.

I doubt this is the reason of the crash, still worth to mention it...

[PATCH] net: net_assign_generic() is not SMP safe

Two cpus could try to expand/update the array at same time, one update
could be lost during the copy of old array.

Re-using net_mutex is an easy way to fix this, it was used right
before to allocate the 'id'

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Pavel Emelyanov July 28, 2009, 9:51 a.m. UTC | #1
Eric Dumazet wrote:
> Igor M Podlesny a écrit :
>> [...]
>>> Could have been a problem in net core, perhaps.
>>>
>>> Below is a ppp fix from 2.6.31, but it seems unlikely to fix your problem.
>>>
>>> It would help if we could see that trace, please.  A digital photo
>>> would suit.
>> 	Here it is:
>>
>> 		http://bugzilla.kernel.org/attachment.cgi?id=22516
>>
>> 	(It's 2.6.30.3)
>> 	
> 
> Looking at this, I believe net_assign_generic() is not safe.
> 
> Two cpus could try to expand/update the array at same time, one update could be lost.
> 
> register_pernet_gen_device() has a mutex to guard against concurrent
> calls, but net_assign_generic() has no locking at all.
> 
> I doubt this is the reason of the crash, still worth to mention it...
> 
> [PATCH] net: net_assign_generic() is not SMP safe
> 
> Two cpus could try to expand/update the array at same time, one update
> could be lost during the copy of old array.

How can this happen? The array is updated only during ->init routines
of the pernet_operations, which are called from under the net_mutex.

Do I miss anything?

> Re-using net_mutex is an easy way to fix this, it was used right
> before to allocate the 'id'
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
> 
> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> index b7292a2..9c31ad1 100644
> --- a/net/core/net_namespace.c
> +++ b/net/core/net_namespace.c
> @@ -467,15 +467,17 @@ int net_assign_generic(struct net *net, int id, void *data)
>  	BUG_ON(!mutex_is_locked(&net_mutex));
>  	BUG_ON(id == 0);
>  
> +	mutex_lock(&net_mutex);
>  	ng = old_ng = net->gen;
>  	if (old_ng->len >= id)
>  		goto assign;
>  
>  	ng = kzalloc(sizeof(struct net_generic) +
>  			id * sizeof(void *), GFP_KERNEL);
> -	if (ng == NULL)
> +	if (ng == NULL) {
> +		mutex_unlock(&net_mutex);
>  		return -ENOMEM;
> -
> +	}
>  	/*
>  	 * Some synchronisation notes:
>  	 *
> @@ -494,6 +496,7 @@ int net_assign_generic(struct net *net, int id, void *data)
>  	call_rcu(&old_ng->rcu, net_generic_release);
>  assign:
>  	ng->ptr[id - 1] = data;
> +	mutex_unlock(&net_mutex);
>  	return 0;
>  }
>  EXPORT_SYMBOL_GPL(net_assign_generic);
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet July 28, 2009, 12:30 p.m. UTC | #2
Pavel Emelyanov a écrit :
> Eric Dumazet wrote:
>> Igor M Podlesny a écrit :
>>> [...]
>>>> Could have been a problem in net core, perhaps.
>>>>
>>>> Below is a ppp fix from 2.6.31, but it seems unlikely to fix your problem.
>>>>
>>>> It would help if we could see that trace, please.  A digital photo
>>>> would suit.
>>> 	Here it is:
>>>
>>> 		http://bugzilla.kernel.org/attachment.cgi?id=22516
>>>
>>> 	(It's 2.6.30.3)
>>> 	
>> Looking at this, I believe net_assign_generic() is not safe.
>>
>> Two cpus could try to expand/update the array at same time, one update could be lost.
>>
>> register_pernet_gen_device() has a mutex to guard against concurrent
>> calls, but net_assign_generic() has no locking at all.
>>
>> I doubt this is the reason of the crash, still worth to mention it...
>>
>> [PATCH] net: net_assign_generic() is not SMP safe
>>
>> Two cpus could try to expand/update the array at same time, one update
>> could be lost during the copy of old array.
> 
> How can this happen? The array is updated only during ->init routines
> of the pernet_operations, which are called from under the net_mutex.
> 
> Do I miss anything?
> 

Oops, I missed the obvious "BUG_ON(!mutex_is_locked(&net_mutex));"

Sorry for the noise and untested patch as well :)

>> Re-using net_mutex is an easy way to fix this, it was used right
>> before to allocate the 'id'
>>
>> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
>> ---
>>
>> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
>> index b7292a2..9c31ad1 100644
>> --- a/net/core/net_namespace.c
>> +++ b/net/core/net_namespace.c
>> @@ -467,15 +467,17 @@ int net_assign_generic(struct net *net, int id, void *data)
>>  	BUG_ON(!mutex_is_locked(&net_mutex));
>>  	BUG_ON(id == 0);
>>  
>> +	mutex_lock(&net_mutex);
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index b7292a2..9c31ad1 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -467,15 +467,17 @@  int net_assign_generic(struct net *net, int id, void *data)
 	BUG_ON(!mutex_is_locked(&net_mutex));
 	BUG_ON(id == 0);
 
+	mutex_lock(&net_mutex);
 	ng = old_ng = net->gen;
 	if (old_ng->len >= id)
 		goto assign;
 
 	ng = kzalloc(sizeof(struct net_generic) +
 			id * sizeof(void *), GFP_KERNEL);
-	if (ng == NULL)
+	if (ng == NULL) {
+		mutex_unlock(&net_mutex);
 		return -ENOMEM;
-
+	}
 	/*
 	 * Some synchronisation notes:
 	 *
@@ -494,6 +496,7 @@  int net_assign_generic(struct net *net, int id, void *data)
 	call_rcu(&old_ng->rcu, net_generic_release);
 assign:
 	ng->ptr[id - 1] = data;
+	mutex_unlock(&net_mutex);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(net_assign_generic);