diff mbox

[bug] __nf_ct_refresh_acct(): WARNING: at lib/list_debug.c:30 __list_add+0x7d/0xad()

Message ID 4A3A5599.4080906@trash.net
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Patrick McHardy June 18, 2009, 2:56 p.m. UTC
Patrick McHardy wrote:
> Eric Dumazet wrote:
>>> a test-box still triggered this crash overnight:
>>>
>>> [  252.433471] ------------[ cut here ]------------
>>> [  252.436031] WARNING: at lib/list_debug.c:30 __list_add+0x95/0xa0()
>>> ...
>>> With your patch (repeated below) applied. Is Patrick's alternative 
>>> patch supposed to fix something that yours does not?
>>
>> Hmm, not really, Patrick patch should fix same problem, but without 
>> extra locking
>> as mine.
>>
>> This new stack trace is somewhat different, as corruption is detected 
>> in the add_timer()
>> call in __nf_conntrack_confirm()
>>
>> So there is another problem. CCed Pablo Neira Ayuso who added some stuff
>> in netfilter and timeout logic recently.
> 
> That timeout logic shouldn't be relevant in this case, its only
> activated when netlink event delivery is used, a userspace process
> is actually listening and it has the socket marked for reliable
> delivery.
> 
> I think its still the same problem, the detection is just noticed
> at a different point.

I can't find anything wrong with the conntrack timer use itself.
Since its the timer base that is corrupted, its possible that
something else is using timers incorrectly and conntrack just
happens to hit the problem afterwards, but it seems rather
unlikely since the first backtrace originated in a network driver,
the second one in the socket layer.

But I've noticed a different problem, the lockless lookup might
incorrectly return an conntrack entry that is supposed to be
invisible. This can happen because we only check for equal tuples
and !atomic_inc_not_zero(refcnt), but the conntrack can be removed
from the lists and still have references to it. It should never
have an active timer at that point however, so all following
mod_timer_pending() calls *should* do nothing unless I'm missing
something.

Ingo, could you please try whether this patch (combined with the
last one) makes any difference? Enabling CONFIG_NETFILTER_DEBUG
could also help.

Comments

Ingo Molnar June 20, 2009, 3:47 p.m. UTC | #1
* Patrick McHardy <kaber@trash.net> wrote:

> Ingo, could you please try whether this patch (combined with the 
> last one) makes any difference? Enabling CONFIG_NETFILTER_DEBUG 
> could also help.

Mind pushing it upstream, and i'll keep things monitored over the 
week following when it hits upstream?

The reason is, the crash ratio is worse than 1:1000, it took a day 
and a 1000 tests to trigger that one. I havent seen it after that.

So it's going to be a very slow observation and you shouldnt 
serialize on me - giving you a 'it works' positive result will take 
10,000 random bootp tests or so - that's a week or longer. (it can 
take a long time to hit that especially in the merge window when 
there's a lot of various test failures that cause hickups in the 
test stream.)

( Mailing me an upstream sha1 when all fixes in this area have hit 
  upstream would certainly be welcome - i can use that as a 'no 
  crashes expected in that area from that point on' flag day. )

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy June 20, 2009, 3:56 p.m. UTC | #2
Ingo Molnar wrote:
> * Patrick McHardy <kaber@trash.net> wrote:
> 
>> Ingo, could you please try whether this patch (combined with the 
>> last one) makes any difference? Enabling CONFIG_NETFILTER_DEBUG 
>> could also help.
> 
> Mind pushing it upstream, and i'll keep things monitored over the 
> week following when it hits upstream?
> 
> The reason is, the crash ratio is worse than 1:1000, it took a day 
> and a 1000 tests to trigger that one. I havent seen it after that.
> 
> So it's going to be a very slow observation and you shouldnt 
> serialize on me - giving you a 'it works' positive result will take 
> 10,000 random bootp tests or so - that's a week or longer. (it can 
> take a long time to hit that especially in the merge window when 
> there's a lot of various test failures that cause hickups in the 
> test stream.)

OK thanks, I'll push it upstream soon.

> ( Mailing me an upstream sha1 when all fixes in this area have hit 
>   upstream would certainly be welcome - i can use that as a 'no 
>   crashes expected in that area from that point on' flag day. )

Sure, will do.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 5f72b94..ce17a69 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -335,7 +335,8 @@  begin:
 	h = __nf_conntrack_find(net, tuple);
 	if (h) {
 		ct = nf_ct_tuplehash_to_ctrack(h);
-		if (unlikely(!atomic_inc_not_zero(&ct->ct_general.use)))
+		if (unlikely(nf_ct_is_dying(ct) ||
+			     !atomic_inc_not_zero(&ct->ct_general.use)))
 			h = NULL;
 		else {
 			if (unlikely(!nf_ct_tuple_equal(tuple, &h->tuple))) {
@@ -503,7 +504,8 @@  static noinline int early_drop(struct net *net, unsigned int hash)
 			cnt++;
 		}
 
-		if (ct && unlikely(!atomic_inc_not_zero(&ct->ct_general.use)))
+		if (ct && unlikely(nf_ct_is_dying(ct) ||
+				   !atomic_inc_not_zero(&ct->ct_general.use)))
 			ct = NULL;
 		if (ct || cnt >= NF_CT_EVICTION_RANGE)
 			break;