diff mbox

[bug] __nf_ct_refresh_acct(): WARNING: at lib/list_debug.c:30 __list_add+0x7d/0xad()

Message ID 4A3A6143.3040607@gmail.com
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet June 18, 2009, 3:46 p.m. UTC
Patrick McHardy a écrit :
> Patrick McHardy wrote:
>> Eric Dumazet wrote:
>>>> a test-box still triggered this crash overnight:
>>>>
>>>> [  252.433471] ------------[ cut here ]------------
>>>> [  252.436031] WARNING: at lib/list_debug.c:30 __list_add+0x95/0xa0()
>>>> ...
>>>> With your patch (repeated below) applied. Is Patrick's alternative
>>>> patch supposed to fix something that yours does not?
>>>
>>> Hmm, not really, Patrick patch should fix same problem, but without
>>> extra locking
>>> as mine.
>>>
>>> This new stack trace is somewhat different, as corruption is detected
>>> in the add_timer()
>>> call in __nf_conntrack_confirm()
>>>
>>> So there is another problem. CCed Pablo Neira Ayuso who added some stuff
>>> in netfilter and timeout logic recently.
>>
>> That timeout logic shouldn't be relevant in this case, its only
>> activated when netlink event delivery is used, a userspace process
>> is actually listening and it has the socket marked for reliable
>> delivery.
>>
>> I think its still the same problem, the detection is just noticed
>> at a different point.
> 
> I can't find anything wrong with the conntrack timer use itself.
> Since its the timer base that is corrupted, its possible that
> something else is using timers incorrectly and conntrack just
> happens to hit the problem afterwards, but it seems rather
> unlikely since the first backtrace originated in a network driver,
> the second one in the socket layer.
> 
> But I've noticed a different problem, the lockless lookup might
> incorrectly return an conntrack entry that is supposed to be
> invisible. This can happen because we only check for equal tuples
> and !atomic_inc_not_zero(refcnt), but the conntrack can be removed
> from the lists and still have references to it. It should never
> have an active timer at that point however, so all following
> mod_timer_pending() calls *should* do nothing unless I'm missing
> something.
> 
> Ingo, could you please try whether this patch (combined with the
> last one) makes any difference? Enabling CONFIG_NETFILTER_DEBUG
> could also help.
> 
> 

Interesting !

In my own analysis, I found death_by_timeout() might be problematic,
with RCU and lockless lookups.

static void death_by_timeout(unsigned long ul_conntrack)
{
        struct nf_conn *ct = (void *)ul_conntrack;

        if (!test_bit(IPS_DYING_BIT, &ct->status) &&
            unlikely(nf_conntrack_event(IPCT_DESTROY, ct) < 0)) {
                /* destroy event was not delivered */
                nf_ct_delete_from_lists(ct);
<< HERE >>

                nf_ct_insert_dying_list(ct);
                return;
        }
        set_bit(IPS_DYING_BIT, &ct->status);
        nf_ct_delete_from_lists(ct);
        nf_ct_put(ct);
}


We delete ct from a list and insert it in a new list.

I believe a reader could "*catch*" ct while doing a lookup and miss the end
of its chain. (nulls algo check the null value at the end of lookup and can
decide to restart the lookup if the null value is not the expected one)

We need to change nf_conntrack_init_net() and use a different "null" value,
guaranteed not being used in regular lists

Patch follows :

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Patrick McHardy June 18, 2009, 4:09 p.m. UTC | #1
Eric Dumazet wrote:
> In my own analysis, I found death_by_timeout() might be problematic,
> with RCU and lockless lookups.
> 
> static void death_by_timeout(unsigned long ul_conntrack)
> {
>         struct nf_conn *ct = (void *)ul_conntrack;
> 
>         if (!test_bit(IPS_DYING_BIT, &ct->status) &&
>             unlikely(nf_conntrack_event(IPCT_DESTROY, ct) < 0)) {
>                 /* destroy event was not delivered */
>                 nf_ct_delete_from_lists(ct);
> << HERE >>
> 
>                 nf_ct_insert_dying_list(ct);
>                 return;
>         }
>         set_bit(IPS_DYING_BIT, &ct->status);
>         nf_ct_delete_from_lists(ct);
>         nf_ct_put(ct);
> }
> 
> 
> We delete ct from a list and insert it in a new list.
> 
> I believe a reader could "*catch*" ct while doing a lookup and miss the end
> of its chain. (nulls algo check the null value at the end of lookup and can
> decide to restart the lookup if the null value is not the expected one)
> 
> We need to change nf_conntrack_init_net() and use a different "null" value,
> guaranteed not being used in regular lists

Good catch. This is a new bug, but it shouldn't matter in this case
since nf_conntrack_event() can't fail unless you have a userspace
listener that makes use of reliable delivery, which I think hasn't
even been released yet.

> Patch follows :

Looks good. If you send me a Signed-off-by: I'll already apply it.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pablo Neira Ayuso June 18, 2009, 4:13 p.m. UTC | #2
Patrick McHardy wrote:
> Eric Dumazet wrote:
>> In my own analysis, I found death_by_timeout() might be problematic,
>> with RCU and lockless lookups.
>>
>> static void death_by_timeout(unsigned long ul_conntrack)
>> {
>>         struct nf_conn *ct = (void *)ul_conntrack;
>>
>>         if (!test_bit(IPS_DYING_BIT, &ct->status) &&
>>             unlikely(nf_conntrack_event(IPCT_DESTROY, ct) < 0)) {
>>                 /* destroy event was not delivered */
>>                 nf_ct_delete_from_lists(ct);
>> << HERE >>
>>
>>                 nf_ct_insert_dying_list(ct);
>>                 return;
>>         }
>>         set_bit(IPS_DYING_BIT, &ct->status);
>>         nf_ct_delete_from_lists(ct);
>>         nf_ct_put(ct);
>> }
>>
>>
>> We delete ct from a list and insert it in a new list.
>>
>> I believe a reader could "*catch*" ct while doing a lookup and miss
>> the end
>> of its chain. (nulls algo check the null value at the end of lookup
>> and can
>> decide to restart the lookup if the null value is not the expected one)
>>
>> We need to change nf_conntrack_init_net() and use a different "null"
>> value,
>> guaranteed not being used in regular lists
> 
> Good catch. This is a new bug, but it shouldn't matter in this case
> since nf_conntrack_event() can't fail unless you have a userspace
> listener that makes use of reliable delivery, which I think hasn't
> even been released yet.

Indeed. I didn't include user-space support for this yet in my tree, so
this should not be the problem. Thanks for the catch anyway!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 5f72b94..5276a2d 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1267,13 +1267,19 @@  err_cache:
 	return ret;
 }
 
+/*
+ * We need to use special "null" values, not used in hash table
+ */
+#define UNCONFIRMED_NULLS_VAL	((1<<30)+0)
+#define DYING_NULLS_VAL		((1<<30)+1)
+
 static int nf_conntrack_init_net(struct net *net)
 {
 	int ret;
 
 	atomic_set(&net->ct.count, 0);
-	INIT_HLIST_NULLS_HEAD(&net->ct.unconfirmed, 0);
-	INIT_HLIST_NULLS_HEAD(&net->ct.dying, 0);
+	INIT_HLIST_NULLS_HEAD(&net->ct.unconfirmed, UNCONFIRMED_NULLS_VAL);
+	INIT_HLIST_NULLS_HEAD(&net->ct.dying, DYING_NULLS_VAL);
 	net->ct.stat = alloc_percpu(struct ip_conntrack_stat);
 	if (!net->ct.stat) {
 		ret = -ENOMEM;