Patchwork [bug] __nf_ct_refresh_acct(): WARNING: at lib/list_debug.c:30 __list_add+0x7d/0xad()

login
register
mail settings
Submitter Eric Dumazet
Date June 17, 2009, 11:55 a.m.
Message ID <4A38D9BE.3020403@gmail.com>
Download mbox | patch
Permalink /patch/28778/
State RFC
Delegated to: David Miller
Headers show

Comments

Eric Dumazet - June 17, 2009, 11:55 a.m.
Patrick McHardy a écrit :
> Eric Dumazet wrote:
>> IPS_CONFIRMED_BIT is set under nf_conntrack_lock (in
>> __nf_conntrack_confirm()),
>> we probably want to add a synchronisation under ct->lock as well,
>> or __nf_ct_refresh_acct() could set ct->timeout.expires to extra_jiffies,
>> while a different cpu could confirm the conntrack.
> 
> Before the conntrack is confirmed, it is exclusively handled by a
> single CPU. I agree that we need to make sure the IPS_CONFIRMED_BIT
> is visible before we add the conntrack to the hash table since the
> lookup is lockless, but simply moving the set_bit before the hash
> insertion should be fine I think.
> 

Hmm...  now we could have the reverse case :

__nf_conntrack_confirm() could be "interrupted" by __nf_ct_refresh_acct()

index 5f72b94..22755fa 100644
<< or here ? >>


 	atomic_inc(&ct->ct_general.use);
-	set_bit(IPS_CONFIRMED_BIT, &ct->status);
 	NF_CT_STAT_INC(net, insert);
 	spin_unlock_bh(&nf_conntrack_lock);
 	help = nfct_help(ct);



Problem is timeout.expires is either a relative or absolute timeout, and changes happen
in __nf_conntrack_confirm() or __nf_ct_refresh_acct().

We must have a synchronization (an barriers), a single bit wont be enough.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy - June 17, 2009, noon
Eric Dumazet wrote:
> Patrick McHardy a écrit :
>> Before the conntrack is confirmed, it is exclusively handled by a
>> single CPU. I agree that we need to make sure the IPS_CONFIRMED_BIT
>> is visible before we add the conntrack to the hash table since the
>> lookup is lockless, but simply moving the set_bit before the hash
>> insertion should be fine I think.
>>
> 
> Hmm...  now we could have the reverse case :
> 
> __nf_conntrack_confirm() could be "interrupted" by __nf_ct_refresh_acct()
> 
> index 5f72b94..22755fa 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -425,6 +425,7 @@ __nf_conntrack_confirm(struct sk_buff *skb)
>  	/* Remove from unconfirmed list */
>  	hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
>  
> +	set_bit(IPS_CONFIRMED_BIT, &ct->status);
>  	__nf_conntrack_hash_insert(ct, hash, repl_hash);
>  	/* Timer relative to confirmation time, not original
>  	   setting time, otherwise we'd get timer wrap in
> @@ -432,7 +433,6 @@ __nf_conntrack_confirm(struct sk_buff *skb)
>  	ct->timeout.expires += jiffies;
> 
> << What happens if another packet is handled by __nf_ct_refresh_acct here >>
> (seeing or not the IPS_CONFIRMED_BIT) >>
> 
>  	add_timer(&ct->timeout);
> 
> << or here ? >>
> 
> 
>  	atomic_inc(&ct->ct_general.use);
> -	set_bit(IPS_CONFIRMED_BIT, &ct->status);
>  	NF_CT_STAT_INC(net, insert);
>  	spin_unlock_bh(&nf_conntrack_lock);
>  	help = nfct_help(ct);
> 
> Problem is timeout.expires is either a relative or absolute timeout, and changes happen
> in __nf_conntrack_confirm() or __nf_ct_refresh_acct().
> 
> We must have a synchronization (an barriers), a single bit wont be enough.

Please have a look at the second patch I just sent. It relies
on the RCU barriers to make sure all stores are visible before
other CPUs can find the conntrack.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - June 17, 2009, 12:33 p.m.
Patrick McHardy a écrit :
> Eric Dumazet wrote:
>> Patrick McHardy a écrit :
>>> Before the conntrack is confirmed, it is exclusively handled by a
>>> single CPU. I agree that we need to make sure the IPS_CONFIRMED_BIT
>>> is visible before we add the conntrack to the hash table since the
>>> lookup is lockless, but simply moving the set_bit before the hash
>>> insertion should be fine I think.
>>>
>>
>>
>> Problem is timeout.expires is either a relative or absolute timeout,
>> and changes happen
>> in __nf_conntrack_confirm() or __nf_ct_refresh_acct().
>>
>> We must have a synchronization (an barriers), a single bit wont be
>> enough.
> 
> Please have a look at the second patch I just sent. It relies
> on the RCU barriers to make sure all stores are visible before
> other CPUs can find the conntrack.
> 

Sorry, I dont understand how your second patch corrects the problem.

This (unconfirmed) conntrack is visible by another cpu. This other
cpu can call __nf_ct_refresh_acct() while this cpu runs
in __nf_conntrack_confirm()

@@ -425,7 +425,6 @@ __nf_conntrack_confirm(struct sk_buff *skb)
 	/* Remove from unconfirmed list */
 	hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
 
-	__nf_conntrack_hash_insert(ct, hash, repl_hash);
 	/* Timer relative to confirmation time, not original
 	   setting time, otherwise we'd get timer wrap in
 	   weird delay cases. */
@@ -433,8 +432,15 @@ __nf_conntrack_confirm(struct sk_buff *skb)
 	add_timer(&ct->timeout);

<<<< another cpu could here change timeout.expires (thinking its still relative) >>>>

 	atomic_inc(&ct->ct_general.use);
 	set_bit(IPS_CONFIRMED_BIT, &ct->status);
+
+	/* Since the lookup is lockless, hash insertion must be after starting the
+	 * timer and setting the CONFIRMED bit. The RCU barriers guarantee that no
+	 * other CPU can find the conntrack before the above stores are visible.
+	 */
+	__nf_conntrack_hash_insert(ct, hash, repl_hash);
 	NF_CT_STAT_INC(net, insert);
 	spin_unlock_bh(&nf_conntrack_lock);
 	help = nfct_help(ct);
 	if (help && help->helper)
 		nf_conntrack_event_cache(IPCT_HELPER, ct);
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy - June 17, 2009, 12:36 p.m.
Eric Dumazet wrote:
> Patrick McHardy a écrit :
>> Eric Dumazet wrote:
>>> Patrick McHardy a écrit :
>>>> Before the conntrack is confirmed, it is exclusively handled by a
>>>> single CPU. I agree that we need to make sure the IPS_CONFIRMED_BIT
>>>> is visible before we add the conntrack to the hash table since the
>>>> lookup is lockless, but simply moving the set_bit before the hash
>>>> insertion should be fine I think.
>>>>
>>>
>>> Problem is timeout.expires is either a relative or absolute timeout,
>>> and changes happen
>>> in __nf_conntrack_confirm() or __nf_ct_refresh_acct().
>>>
>>> We must have a synchronization (an barriers), a single bit wont be
>>> enough.
>> Please have a look at the second patch I just sent. It relies
>> on the RCU barriers to make sure all stores are visible before
>> other CPUs can find the conntrack.
>>
> 
> Sorry, I dont understand how your second patch corrects the problem.
> 
> This (unconfirmed) conntrack is visible by another cpu.

No, before it is confirmed, its only visible to the CPU handling
the initial packet of a connection. Confirmation is the step that
makes it visible to other CPUs.

> This other
> cpu can call __nf_ct_refresh_acct() while this cpu runs
> in __nf_conntrack_confirm()

Not for the same conntrack, that would be a seperate bug.

Does that explain what I'm trying to do? :)

> 
> @@ -425,7 +425,6 @@ __nf_conntrack_confirm(struct sk_buff *skb)
>  	/* Remove from unconfirmed list */
>  	hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
>  
> -	__nf_conntrack_hash_insert(ct, hash, repl_hash);
>  	/* Timer relative to confirmation time, not original
>  	   setting time, otherwise we'd get timer wrap in
>  	   weird delay cases. */
> @@ -433,8 +432,15 @@ __nf_conntrack_confirm(struct sk_buff *skb)
>  	add_timer(&ct->timeout);
> 
> <<<< another cpu could here change timeout.expires (thinking its still relative) >>>>
> 
>  	atomic_inc(&ct->ct_general.use);
>  	set_bit(IPS_CONFIRMED_BIT, &ct->status);
> +
> +	/* Since the lookup is lockless, hash insertion must be after starting the
> +	 * timer and setting the CONFIRMED bit. The RCU barriers guarantee that no
> +	 * other CPU can find the conntrack before the above stores are visible.
> +	 */
> +	__nf_conntrack_hash_insert(ct, hash, repl_hash);
>  	NF_CT_STAT_INC(net, insert);
>  	spin_unlock_bh(&nf_conntrack_lock);
>  	help = nfct_help(ct);
>  	if (help && help->helper)
>  		nf_conntrack_event_cache(IPCT_HELPER, ct);
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - June 17, 2009, 1:27 p.m.
Patrick McHardy a écrit :
> Eric Dumazet wrote:
>> Patrick McHardy a écrit :
>>> Eric Dumazet wrote:
>>>> Patrick McHardy a écrit :
>>>>> Before the conntrack is confirmed, it is exclusively handled by a
>>>>> single CPU. I agree that we need to make sure the IPS_CONFIRMED_BIT
>>>>> is visible before we add the conntrack to the hash table since the
>>>>> lookup is lockless, but simply moving the set_bit before the hash
>>>>> insertion should be fine I think.
>>>>>
>>>>
>>>> Problem is timeout.expires is either a relative or absolute timeout,
>>>> and changes happen
>>>> in __nf_conntrack_confirm() or __nf_ct_refresh_acct().
>>>>
>>>> We must have a synchronization (an barriers), a single bit wont be
>>>> enough.
>>> Please have a look at the second patch I just sent. It relies
>>> on the RCU barriers to make sure all stores are visible before
>>> other CPUs can find the conntrack.
>>>
>>
>> Sorry, I dont understand how your second patch corrects the problem.
>>
>> This (unconfirmed) conntrack is visible by another cpu.
> 
> No, before it is confirmed, its only visible to the CPU handling
> the initial packet of a connection. Confirmation is the step that
> makes it visible to other CPUs.

Thanks Patrick, I missed this, and your patch seems fine now :)

> 
>> This other
>> cpu can call __nf_ct_refresh_acct() while this cpu runs
>> in __nf_conntrack_confirm()
> 
> Not for the same conntrack, that would be a seperate bug.
> 
> Does that explain what I'm trying to do? :)


Yes sure, thanks again.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy - June 17, 2009, 1:29 p.m.
Eric Dumazet wrote:
> Patrick McHardy a écrit :
>> Eric Dumazet wrote:
>>> Patrick McHardy a écrit :
>>>> Eric Dumazet wrote:
>>>>> Patrick McHardy a écrit :
>>>>>> Before the conntrack is confirmed, it is exclusively handled by a
>>>>>> single CPU. I agree that we need to make sure the IPS_CONFIRMED_BIT
>>>>>> is visible before we add the conntrack to the hash table since the
>>>>>> lookup is lockless, but simply moving the set_bit before the hash
>>>>>> insertion should be fine I think.
>>>>>>
>>>>> Problem is timeout.expires is either a relative or absolute timeout,
>>>>> and changes happen
>>>>> in __nf_conntrack_confirm() or __nf_ct_refresh_acct().
>>>>>
>>>>> We must have a synchronization (an barriers), a single bit wont be
>>>>> enough.
>>>> Please have a look at the second patch I just sent. It relies
>>>> on the RCU barriers to make sure all stores are visible before
>>>> other CPUs can find the conntrack.
>>>>
>>> Sorry, I dont understand how your second patch corrects the problem.
>>>
>>> This (unconfirmed) conntrack is visible by another cpu.
>> No, before it is confirmed, its only visible to the CPU handling
>> the initial packet of a connection. Confirmation is the step that
>> makes it visible to other CPUs.
> 
> Thanks Patrick, I missed this, and your patch seems fine now :)

Thanks for your help, I'll send it to Dave later today.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -425,6 +425,7 @@  __nf_conntrack_confirm(struct sk_buff *skb)
 	/* Remove from unconfirmed list */
 	hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
 
+	set_bit(IPS_CONFIRMED_BIT, &ct->status);
 	__nf_conntrack_hash_insert(ct, hash, repl_hash);
 	/* Timer relative to confirmation time, not original
 	   setting time, otherwise we'd get timer wrap in
@@ -432,7 +433,6 @@  __nf_conntrack_confirm(struct sk_buff *skb)
 	ct->timeout.expires += jiffies;

<< What happens if another packet is handled by __nf_ct_refresh_acct here >>
(seeing or not the IPS_CONFIRMED_BIT) >>

 	add_timer(&ct->timeout);