diff mbox

kernel panic when running /etc/init.d/iptables restart

Message ID CAFFEFTXxsPRf1Rmxyih0wL4O6q8G0OFmz0iRLX0tOYedYfajHw@mail.gmail.com
State Not Applicable
Headers show

Commit Message

canqun zhang Dec. 24, 2012, 5:51 a.m. UTC
Hi Patrick,
If i start  one lxc container instance, and then in the system there
will be two net namespaces,one is init_net namespace, the other is
created by lxc.If running "/etc/init.d/iptables restart",the system
will be panic. I find iptables restarting will clean init_net
namespace firstly,then clean the net namespace created by lxc,buf
related functions about cleaning up init_net namespace will destroy
global variables such as nf_ct_destroy,ip_ct_attach,etc.So,funtions
cleaning up  the other net namespace will be panic.

I fixed it up (see below) .If the system need cleaning init_net
namespace, Ip conntrack belonging to other namespaces will be cleaned
up firstly.

        nf_ct_iterate_cleanup(net, kill_all, NULL);
        nf_ct_release_dying_list(net);
        if (atomic_read(&net->ct.count) != 0) {
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Gao feng Dec. 25, 2012, 5:36 a.m. UTC | #1
cc netdev
Hi canqun:

On 2012/12/24 13:51, canqun zhang wrote:
> Hi Patrick,
> If i start  one lxc container instance, and then in the system there
> will be two net namespaces,one is init_net namespace, the other is
> created by lxc.If running "/etc/init.d/iptables restart",the system
> will be panic. I find iptables restarting will clean init_net
> namespace firstly,then clean the net namespace created by lxc,buf
> related functions about cleaning up init_net namespace will destroy
> global variables such as nf_ct_destroy,ip_ct_attach,etc.So,funtions
> cleaning up  the other net namespace will be panic.
> 

I'm afraid that the system will not panic.
When do rmmod nf_conntrack_ipv[4,6],we already call nf_ct_iterate_cleanup
to destroy the nf_conns which belongs to l[3,4]proto  protocols,At this
time the nf_ct_destroy still points to destroy_conntrack because the module
nf_conntrack is hold by l3 and l4proto.
You can check the function nf_conntrack_l[3,4]proto_unregister.

Can you make it a little clear?
The reproduction and oops dump stack is useful.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
canqun zhang Dec. 25, 2012, 7:25 a.m. UTC | #2
Hi Gao feng
The stack information is as follows. The kenel will panic because the
nf_ct_destroy is NULL.

Reproduction:
(1) starting a lxc container
(2) iptables -t nat -A POSTROUTING -s 10.48.254.18 -o eth1 -j
MASQUERADE (run it on host machine)
(3) /etc/ini.d/iptables save (run it on host machine)
(4)/etc/init.d/iptables restart (run it on host machine)

Stack:
Pid: 0, comm: swapper Not tainted 2.6.32-279.14.1.rc3.el6.x86_64 #1
IBM System x3650 M4 -[7915IA4]-/00J6528
RIP: 0010:[<ffffffff81466949>]  [<ffffffff81466949>]
nf_conntrack_destroy+0x19/0x30
RSP: 0018:ffff880028303ab0  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff881051b237c0 RCX: 00000000000000d4
RDX: 0000000000000500 RSI: 0000000000000000 RDI: ffff881056bc0528
RBP: ffff880028303ab0 R08: ffff8810514fc020 R09: ffff8810574b6110
R10: 0000000000000000 R11: 0000000000000004 R12: ffffffff814445fe
R13: ffff88105327fba8 R14: ffff882059ed6e00 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff880028300000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007f3484002098 CR3: 0000002056792000 CR4: 00000000000406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff8810590b2000, task ffff88205954c040)
Stack:
 ffff880028303ad0 ffffffff8142febd ffff880028303b30 ffff881051b237c0
<d> ffff880028303af0 ffffffff8142fc36 0000000200000004 ffff881051b237c0
<d> ffff880028303b20 ffffffff8142fd82 ffff880028303b20 ffff882059ed6d80
Call Trace:
 <IRQ>
 [<ffffffff8142febd>] skb_release_head_state+0xed/0x110
 [<ffffffff8142fc36>] __kfree_skb+0x16/0xa0
 [<ffffffff8142fd82>] kfree_skb+0x42/0x90
 [<ffffffff814445fe>] __neigh_event_send+0x11e/0x1e0
 [<ffffffff814447f3>] neigh_resolve_output+0x133/0x370
 [<ffffffff81054d55>] ? select_idle_sibling+0x95/0x150
 [<ffffffff814774b7>] ip_finish_output+0x237/0x310
 [<ffffffff81477648>] ip_output+0xb8/0xc0
 [<ffffffff81476945>] ip_local_out+0x25/0x30
 [<ffffffff81476e20>] ip_queue_xmit+0x190/0x420
 [<ffffffff8106012c>] ? try_to_wake_up+0x24c/0x3e0
 [<ffffffff8148bbae>] tcp_transmit_skb+0x3fe/0x7b0
 [<ffffffff8148cfda>] tcp_retransmit_skb+0x1ba/0x5f0
 [<ffffffff81053463>] ? __wake_up+0x53/0x70
 [<ffffffff8148fd00>] ? tcp_write_timer+0x0/0x200
 [<ffffffff8148f85f>] tcp_retransmit_timer+0x1df/0x680
 [<ffffffff8148fe98>] tcp_write_timer+0x198/0x200
 [<ffffffff8107e907>] run_timer_softirq+0x197/0x340
 [<ffffffff810a2350>] ? tick_sched_timer+0x0/0xc0
 [<ffffffff8102b40d>] ? lapic_next_event+0x1d/0x30
 [<ffffffff81073f31>] __do_softirq+0xc1/0x1e0
 [<ffffffff81096d30>] ? hrtimer_interrupt+0x140/0x250
 [<ffffffff8100c24c>] call_softirq+0x1c/0x30
 [<ffffffff8100de85>] do_softirq+0x65/0xa0
 [<ffffffff81073d15>] irq_exit+0x85/0x90
 [<ffffffff81506050>] smp_apic_timer_interrupt+0x70/0x9b
 [<ffffffff8100bc13>] apic_timer_interrupt+0x13/0x20
 <EOI>
 [<ffffffff812cd9ae>] ? intel_idle+0xde/0x170
 [<ffffffff812cd991>] ? intel_idle+0xc1/0x170
 [<ffffffff8109922d>] ? sched_clock_cpu+0xcd/0x110
 [<ffffffff81407827>] cpuidle_idle_call+0xa7/0x140
 [<ffffffff81009e06>] cpu_idle+0xb6/0x110
 [<ffffffff814f714f>] start_secondary+0x22a/0x26d
Code: 02 ff d0 c9 c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89
e5 0f 1f 44 00 00 48 8b 05 90 7c ba 00 48 85 c0 74 04 ff d0 c9 c3 <0f>
0b 0f 1f 44 00 00 eb f9 66 66 66 66 66
2e 0f 1f 84 00 00 00

2012/12/25 Gao feng <gaofeng@cn.fujitsu.com>:
> cc netdev
> Hi canqun:
>
> On 2012/12/24 13:51, canqun zhang wrote:
>> Hi Patrick,
>> If i start  one lxc container instance, and then in the system there
>> will be two net namespaces,one is init_net namespace, the other is
>> created by lxc.If running "/etc/init.d/iptables restart",the system
>> will be panic. I find iptables restarting will clean init_net
>> namespace firstly,then clean the net namespace created by lxc,buf
>> related functions about cleaning up init_net namespace will destroy
>> global variables such as nf_ct_destroy,ip_ct_attach,etc.So,funtions
>> cleaning up  the other net namespace will be panic.
>>
>
> I'm afraid that the system will not panic.
> When do rmmod nf_conntrack_ipv[4,6],we already call nf_ct_iterate_cleanup
> to destroy the nf_conns which belongs to l[3,4]proto  protocols,At this
> time the nf_ct_destroy still points to destroy_conntrack because the module
> nf_conntrack is hold by l3 and l4proto.
> You can check the function nf_conntrack_l[3,4]proto_unregister.
>
> Can you make it a little clear?
> The reproduction and oops dump stack is useful.
>
> Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gao feng Dec. 25, 2012, 8:38 a.m. UTC | #3
On 2012/12/25 15:25, canqun zhang wrote:
> Hi Gao feng
> The stack information is as follows. The kenel will panic because the
> nf_ct_destroy is NULL.
> 
> Reproduction:
> (1) starting a lxc container
> (2) iptables -t nat -A POSTROUTING -s 10.48.254.18 -o eth1 -j
> MASQUERADE (run it on host machine)
> (3) /etc/ini.d/iptables save (run it on host machine)
> (4)/etc/init.d/iptables restart (run it on host machine)
> 

Thanks!
It seems that nf_conntrack_l[3,4]proto_unregister doesn't make sure
nf_conns of the proto being destroyed.

If I'm right, there is another problem even your fix this panic problem.
the l3,14proto will be unregistered before all of it's nf_conns being destroyed.
So even nf_ct_destroy is not NULL,in destroy_conntrack we are not able to
find the right l4proto,the l4proto->destroy will be incorrect.resources will
not be released correctly.

So I think the root problem is we do register/unregister, set/unset both on the
first net (init_net), Maybe it's better to do register set on the first net, and
do unregister unset on the last net.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gao feng Dec. 25, 2012, 8:50 a.m. UTC | #4
On 2012/12/25 15:25, canqun zhang wrote:
> Hi Gao feng
> The stack information is as follows. The kenel will panic because the
> nf_ct_destroy is NULL.

Thanks!
It seems that nf_conntrack_l[3,4]proto_unregister doesn't make sure
nf_conns of the proto being destroyed.

If I'm right, there is another problem even your fix this panic problem.
the l3,14proto will be unregistered before all of it's nf_conns being destroyed.
So even nf_ct_destroy is not NULL,in destroy_conntrack we are not able to
find the right l4proto,the l4proto->destroy will be incorrect.resources will
not be released correctly.

So I think the root problem is we do register/unregister, set/unset both on the
first net (init_net), Maybe it's better to do register set on the first net, and
do unregister unset on the last net.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
canqun zhang Dec. 25, 2012, 10:45 a.m. UTC | #5
Thanks for your suggestion,i will modify this patch and take tests.

2012/12/25 Gao feng <gaofeng@cn.fujitsu.com>:
> On 2012/12/25 15:25, canqun zhang wrote:
>> Hi Gao feng
>> The stack information is as follows. The kenel will panic because the
>> nf_ct_destroy is NULL.
>>
>> Reproduction:
>> (1) starting a lxc container
>> (2) iptables -t nat -A POSTROUTING -s 10.48.254.18 -o eth1 -j
>> MASQUERADE (run it on host machine)
>> (3) /etc/ini.d/iptables save (run it on host machine)
>> (4)/etc/init.d/iptables restart (run it on host machine)
>>
>
> Thanks!
> It seems that nf_conntrack_l[3,4]proto_unregister doesn't make sure
> nf_conns of the proto being destroyed.
>
> If I'm right, there is another problem even your fix this panic problem.
> the l3,14proto will be unregistered before all of it's nf_conns being destroyed.
> So even nf_ct_destroy is not NULL,in destroy_conntrack we are not able to
> find the right l4proto,the l4proto->destroy will be incorrect.resources will
> not be released correctly.
>
> So I think the root problem is we do register/unregister, set/unset both on the
> first net (init_net), Maybe it's better to do register set on the first net, and
> do unregister unset on the last net.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff -r 7884e663ef6f -r 57fd45b8a144 net/netfilter/nf_conntrack_
core.c
--- a/net/netfilter/nf_conntrack_core.c Sun Dec 09 21:41:08 2012 +0800
+++ b/net/netfilter/nf_conntrack_core.c Sun Dec 23 16:28:15 2012 +0800
@@ -1122,7 +1122,22 @@ 
diff -r 7884e663ef6f -r 57fd45b8a144 net/netfilter/nf_conntrack_core.c
--- a/net/netfilter/nf_conntrack_core.c Sun Dec 09 21:41:08 2012 +0800
+++ b/net/netfilter/nf_conntrack_core.c Sun Dec 23 16:28:15 2012 +0800
@@ -1122,7 +1122,22 @@ 

 static void nf_conntrack_cleanup_net(struct net *net)
 {
- i_see_dead_people:
+       if (net == &init_net) {
+               struct net *net_poll;
+               rcu_read_lock();
+               for_each_net_rcu(net_poll) {
+                       synchronize_net();
+               again:
+                       nf_ct_iterate_cleanup(net_poll, kill_all, NULL);
+                       nf_ct_release_dying_list(net_poll);
+                       if (atomic_read(&net_poll->ct.count) != 0) {
+                               schedule();
+                               goto again;
+                       }
+               }
+               rcu_read_unlock();
+       }
+i_see_dead_people: