Message ID | 20130722214553.GF6538@order.stressinduktion.org |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
Hallo Hannes On 2013年07月23日 05:45, Hannes Frederic Sowa wrote: > Otherwise we end up dereferencing the already freed net->ipv6.mrt pointer > which leads to a panic (from Srivatsa S. Bhat): > > BUG: unable to handle kernel paging request at ffff882018552020 > IP: [<ffffffffa0366b02>] ip6mr_sk_done+0x32/0xb0 [ipv6] > PGD 290a067 PUD 207ffe0067 PMD 207ff1d067 PTE 8000002018552060 > Oops: 0000 [#1] SMP DEBUG_PAGEALLOC > Modules linked in: ebtable_nat ebtables nfs fscache nf_conntrack_ipv4 nf_defrag_ipv4 ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables nfsd lockd nfs_acl exportfs auth_rpcgss autofs4 sunrpc 8021q garp bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter > +ip6_tables ipv6 vfat fat vhost_net macvtap macvlan vhost tun kvm_intel kvm uinput iTCO_wdt iTCO_vendor_support cdc_ether usbnet mii microcode i2c_i801 i2c_core lpc_ich mfd_core shpchp ioatdma dca mlx4_core be2net wmi acpi_cpufreq mperf ext4 jbd2 mbcache dm_mirror dm_region_hash dm_log dm_mod > CPU: 0 PID: 7 Comm: kworker/u33:0 Not tainted 3.11.0-rc1-ea45e-a #4 > Hardware name: IBM -[8737R2A]-/00Y2738, BIOS -[B2E120RUS-1.20]- 11/30/2012 > Workqueue: netns cleanup_net > task: ffff8810393641c0 ti: ffff881039366000 task.ti: ffff881039366000 > RIP: 0010:[<ffffffffa0366b02>] [<ffffffffa0366b02>] ip6mr_sk_done+0x32/0xb0 [ipv6] > RSP: 0018:ffff881039367bd8 EFLAGS: 00010286 > RAX: ffff881039367fd8 RBX: ffff882018552000 RCX: dead000000200200 > RDX: 0000000000000000 RSI: ffff881039367b68 RDI: ffff881039367b68 > RBP: ffff881039367bf8 R08: ffff881039367b68 R09: 2222222222222222 > R10: 2222222222222222 R11: 2222222222222222 R12: ffff882015a7a040 > R13: ffff882014eb89c0 R14: ffff8820289e2800 R15: 0000000000000000 > FS: 0000000000000000(0000) GS:ffff88103fc00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: ffff882018552020 CR3: 0000000001c0b000 CR4: 00000000000407f0 > Stack: > ffff881039367c18 ffff882014eb89c0 ffff882015e28c00 0000000000000000 > ffff881039367c18 ffffffffa034d9d1 ffff8820289e2800 ffff882014eb89c0 > ffff881039367c58 ffffffff815bdecb ffffffff815bddf2 ffff882014eb89c0 > Call Trace: > [<ffffffffa034d9d1>] rawv6_close+0x21/0x40 [ipv6] > [<ffffffff815bdecb>] inet_release+0xfb/0x220 > [<ffffffff815bddf2>] ? inet_release+0x22/0x220 > [<ffffffffa032686f>] inet6_release+0x3f/0x50 [ipv6] > [<ffffffff8151c1d9>] sock_release+0x29/0xa0 > [<ffffffff81525520>] sk_release_kernel+0x30/0x70 > [<ffffffffa034f14b>] icmpv6_sk_exit+0x3b/0x80 [ipv6] > [<ffffffff8152fff9>] ops_exit_list+0x39/0x60 > [<ffffffff815306fb>] cleanup_net+0xfb/0x1a0 > [<ffffffff81075e3a>] process_one_work+0x1da/0x610 > [<ffffffff81075dc9>] ? process_one_work+0x169/0x610 > [<ffffffff81076390>] worker_thread+0x120/0x3a0 > [<ffffffff81076270>] ? process_one_work+0x610/0x610 > [<ffffffff8107da2e>] kthread+0xee/0x100 > [<ffffffff8107d940>] ? __init_kthread_worker+0x70/0x70 > [<ffffffff8162a99c>] ret_from_fork+0x7c/0xb0 > [<ffffffff8107d940>] ? __init_kthread_worker+0x70/0x70 This call trace actually comes a long way down from put_net, indicating net reference count is ZERO, then clean all things up. Since Srivatsa didn't enable CONFIG_IPV6_MROUTE_MULTIPLE_TABLES, so mrt could only be net->ipv6.mrt6, which is allocated in ip6mr_rules_init. The only place to free mrt6 is ip6mr_rules_exit in this configuration. Question is how could mrt6 be freed in ip6mr_rules_exit, and then put_net has decreased net->count to zero and then free all things up, all of this happened in boot up process? Even if mrt6 has been freed, subsequent access on that area didn't necessarily cause page fault, PTE 8000002018552060 shows the mapping is not present at all. I think SLAB did tired down page entry for kfreed area once pte has been setup already. The scenario is weird for me, or my ignorance reach a new limit :) > Code: 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 4c 8b 67 30 49 89 fd e8 db 3c 1e e1 49 8b 9c 24 90 08 00 00 48 85 db 74 06<4c> 39 6b 20 74 20 bb f3 ff ff ff e8 8e 3c 1e e1 89 d8 4c 8b 65 > RIP [<ffffffffa0366b02>] ip6mr_sk_done+0x32/0xb0 [ipv6] > RSP<ffff881039367bd8> > CR2: ffff882018552020 > > Reported-by: Srivatsa S. Bhat<srivatsa.bhat@linux.vnet.ibm.com> > Tested-by: Srivatsa S. Bhat<srivatsa.bhat@linux.vnet.ibm.com> > Signed-off-by: Hannes Frederic Sowa<hannes@stressinduktion.org> > --- > net/ipv6/ip6mr.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c > index 583e8d4..03986d3 100644 > --- a/net/ipv6/ip6mr.c > +++ b/net/ipv6/ip6mr.c > @@ -259,10 +259,12 @@ static void __net_exit ip6mr_rules_exit(struct net *net) > { > struct mr6_table *mrt, *next; > > + rtnl_lock(); > list_for_each_entry_safe(mrt, next,&net->ipv6.mr6_tables, list) { > list_del(&mrt->list); > ip6mr_free_table(mrt); > } > + rtnl_unlock(); > fib_rules_unregister(net->ipv6.mr6_rules_ops); > } > #else > @@ -289,7 +291,10 @@ static int __net_init ip6mr_rules_init(struct net *net) > > static void __net_exit ip6mr_rules_exit(struct net *net) > { > + rtnl_lock(); > ip6mr_free_table(net->ipv6.mrt6); > + net->ipv6.mrt6 = NULL; > + rtnl_unlock(); > } > #endif >
On Tue, Jul 23, 2013 at 04:28:14PM +0800, Fan Du wrote: > This call trace actually comes a long way down from put_net, indicating net > reference count is ZERO, then clean all things up. Since Srivatsa didn't > enable CONFIG_IPV6_MROUTE_MULTIPLE_TABLES, so mrt could only be > net->ipv6.mrt6, > which is allocated in ip6mr_rules_init. The only place to free mrt6 is > ip6mr_rules_exit in this configuration. > > Question is how could mrt6 be freed in ip6mr_rules_exit, and then put_net > has > decreased net->count to zero and then free all things up, all of this > happened > in boot up process? Even if mrt6 has been freed, subsequent access on that > area > didn't necessarily cause page fault, PTE 8000002018552060 shows the mapping > is > not present at all. I think SLAB did tired down page entry for kfreed area > once > pte has been setup already. > > The scenario is weird for me, or my ignorance reach a new limit :) The important thing is: net->ipv6.mrt6 = NULL; After we freed the mrt6 table we leave the pointer value intact. So when the namespace cleanup calls the next callback and the icmpv6 socket closes, we try to check mrt->mroute6_sk again. See how ip6mr_for_each_table expands if you compile without CONFIG_IPV6_MROUTE_MULTIPLE_TABLES: #define ip6mr_for_each_table(mrt, net) \ for (mrt = net->ipv6.mrt6; mrt; mrt = NULL) It is merely a NULL check. Does it make sense now? I added the rtnl_lock for ip6mr_free_table ==> mroute_clean_tables ==> mif6_delete and the lock has nothing to do with this "race", as the netspace cleanup is serialized in the workqueue. Greetings, Hannes -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Hannes Frederic Sowa <hannes@stressinduktion.org> Date: Mon, 22 Jul 2013 23:45:53 +0200 > Otherwise we end up dereferencing the already freed net->ipv6.mrt pointer > which leads to a panic (from Srivatsa S. Bhat): ... > Reported-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> > Tested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> > Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Applied and queued up for -stable, thanks! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c index 583e8d4..03986d3 100644 --- a/net/ipv6/ip6mr.c +++ b/net/ipv6/ip6mr.c @@ -259,10 +259,12 @@ static void __net_exit ip6mr_rules_exit(struct net *net) { struct mr6_table *mrt, *next; + rtnl_lock(); list_for_each_entry_safe(mrt, next, &net->ipv6.mr6_tables, list) { list_del(&mrt->list); ip6mr_free_table(mrt); } + rtnl_unlock(); fib_rules_unregister(net->ipv6.mr6_rules_ops); } #else @@ -289,7 +291,10 @@ static int __net_init ip6mr_rules_init(struct net *net) static void __net_exit ip6mr_rules_exit(struct net *net) { + rtnl_lock(); ip6mr_free_table(net->ipv6.mrt6); + net->ipv6.mrt6 = NULL; + rtnl_unlock(); } #endif