Patchwork System freeze on reboot - general protection fault

login
register
mail settings
Submitter Zdenek Kabelac
Date Sept. 2, 2009, 9:45 p.m.
Message ID <c4e36d110909021445l5f44183es9e338437dfbbd195@mail.gmail.com>
Download mbox | patch
Permalink /patch/32851/
State RFC
Delegated to: David Miller
Headers show

Comments

Zdenek Kabelac - Sept. 2, 2009, 9:45 p.m.
2009/8/17 Patrick McHardy <kaber@trash.net>:
> Eric Dumazet wrote:
>> Zdenek Kabelac a écrit :
>>>  [<ffffffffa02c502f>] nf_conntrack_ftp_fini+0x2f/0x70 [nf_conntrack_ftp]
>>>  [<ffffffff8027bcc5>] sys_delete_module+0x1a5/0x270
>>>  [<ffffffff8020d329>] ? retint_swapgs+0xe/0x13
>>>  [<ffffffff80271bf2>] ? trace_hardirqs_on_caller+0x162/0x1b0
>>>  [<ffffffff80292121>] ? audit_syscall_entry+0x191/0x1c0
>>>  [<ffffffff80526dae>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>>>  [<ffffffff8020c84b>] system_call_fastpath+0x16/0x1b
>>> Code: c6 00 00 0f 82 66 ff ff ff 49 8b 9e d8 05 00 00 48 85 db 75 16
>>> e9 8e 00 00 00 0f 1f 44 00 00 48 85 c0 0f 84 80 00 00 00 48 89 c3 <0f>
>>> b6 4b 37 48 8b 03 48 8d 14 cd 00 00 00 00 0f 18 08 48 29 ca
>>> RIP  [<ffffffffa02b2c2c>] nf_conntrack_helper_unregister+0x16c/0x320
>>> [nf_conntrack]
>>>  RSP <ffff88013982fe68>
>>> CR2: 0000000000000038
>>> ---[ end trace bc3a0ede3d0084db ]---
>>>
>> I am currently traveling and wont be able to help you before next week.
>>
>> I added netdev, Patrick, and netfilter-devel in CC so that more eyes can take a look.
>
> Thanks for the report, I'll have a look at this. Zdenek, please
> send me the nf_conntrack.ko file used in the above oops. Thanks.
>

Ok

I've found the solution for my problem.

http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/30483

I've made this small fix from this thread:



As the thread nf_conntrack: Use rcu_barrier() and fix kmem_cache_create flags
seems to be samewhat 'unfinished'  and already a bit old and I've no
idea whether it actually fixes problem completely or just hides it in
my case - I'm leaving it to some RCU gurus to fix this issue.

All I could say is - this this extra rcu_barrier() and removal of
SLAB_DESTROY removes my GPF on reboot.

Zdenek
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - Sept. 2, 2009, 10:17 p.m.
Zdenek Kabelac a écrit :
> 2009/8/17 Patrick McHardy <kaber@trash.net>:
>> Eric Dumazet wrote:
>>> Zdenek Kabelac a écrit :
>>>>  [<ffffffffa02c502f>] nf_conntrack_ftp_fini+0x2f/0x70 [nf_conntrack_ftp]
>>>>  [<ffffffff8027bcc5>] sys_delete_module+0x1a5/0x270
>>>>  [<ffffffff8020d329>] ? retint_swapgs+0xe/0x13
>>>>  [<ffffffff80271bf2>] ? trace_hardirqs_on_caller+0x162/0x1b0
>>>>  [<ffffffff80292121>] ? audit_syscall_entry+0x191/0x1c0
>>>>  [<ffffffff80526dae>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>>>>  [<ffffffff8020c84b>] system_call_fastpath+0x16/0x1b
>>>> Code: c6 00 00 0f 82 66 ff ff ff 49 8b 9e d8 05 00 00 48 85 db 75 16
>>>> e9 8e 00 00 00 0f 1f 44 00 00 48 85 c0 0f 84 80 00 00 00 48 89 c3 <0f>
>>>> b6 4b 37 48 8b 03 48 8d 14 cd 00 00 00 00 0f 18 08 48 29 ca
>>>> RIP  [<ffffffffa02b2c2c>] nf_conntrack_helper_unregister+0x16c/0x320
>>>> [nf_conntrack]
>>>>  RSP <ffff88013982fe68>
>>>> CR2: 0000000000000038
>>>> ---[ end trace bc3a0ede3d0084db ]---
>>>>
>>> I am currently traveling and wont be able to help you before next week.
>>>
>>> I added netdev, Patrick, and netfilter-devel in CC so that more eyes can take a look.
>> Thanks for the report, I'll have a look at this. Zdenek, please
>> send me the nf_conntrack.ko file used in the above oops. Thanks.
>>
> 
> Ok
> 
> I've found the solution for my problem.
> 
> http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/30483
> 
> I've made this small fix from this thread:
> 
> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core
> index b5869b9..68488f8 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -1108,6 +1108,7 @@ static void nf_conntrack_cleanup_init_net(void)
>  {
>         nf_conntrack_helper_fini();
>         nf_conntrack_proto_fini();
> +       rcu_barrier();
>         kmem_cache_destroy(nf_conntrack_cachep);
>  }
> 
> @@ -1266,7 +1267,7 @@ static int nf_conntrack_init_init_net(void)
> 
>         nf_conntrack_cachep = kmem_cache_create("nf_conntrack",
>                                                 sizeof(struct nf_conn),
> -                                               0, SLAB_DESTROY_BY_RCU, NULL);
> +                                               0, 0, NULL);
>         if (!nf_conntrack_cachep) {
>                 printk(KERN_ERR "Unable to create nf_conn slab cache\n");
>                 ret = -ENOMEM;
> 
> 
> As the thread nf_conntrack: Use rcu_barrier() and fix kmem_cache_create flags
> seems to be samewhat 'unfinished'  and already a bit old and I've no
> idea whether it actually fixes problem completely or just hides it in
> my case - I'm leaving it to some RCU gurus to fix this issue.
> 
> All I could say is - this this extra rcu_barrier() and removal of
> SLAB_DESTROY removes my GPF on reboot.
> 
> Zdenek

Ouch..

Dont think such a patch makes your kernel better, it'll crash too.

You cannot remove SLAB_DESTROY_BY_RCU like this, it's there for very good reasons.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zdenek Kabelac - Sept. 2, 2009, 10:31 p.m.
2009/9/3 Eric Dumazet <eric.dumazet@gmail.com>:
> Zdenek Kabelac a écrit :
>> 2009/8/17 Patrick McHardy <kaber@trash.net>:
>>> Eric Dumazet wrote:
>>>> Zdenek Kabelac a écrit :
>>>>>  [<ffffffffa02c502f>] nf_conntrack_ftp_fini+0x2f/0x70 [nf_conntrack_ftp]
>>>>>  [<ffffffff8027bcc5>] sys_delete_module+0x1a5/0x270
>>>>>  [<ffffffff8020d329>] ? retint_swapgs+0xe/0x13
>>>>>  [<ffffffff80271bf2>] ? trace_hardirqs_on_caller+0x162/0x1b0
>>>>>  [<ffffffff80292121>] ? audit_syscall_entry+0x191/0x1c0
>>>>>  [<ffffffff80526dae>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>>>>>  [<ffffffff8020c84b>] system_call_fastpath+0x16/0x1b
>>>>> Code: c6 00 00 0f 82 66 ff ff ff 49 8b 9e d8 05 00 00 48 85 db 75 16
>>>>> e9 8e 00 00 00 0f 1f 44 00 00 48 85 c0 0f 84 80 00 00 00 48 89 c3 <0f>
>>>>> b6 4b 37 48 8b 03 48 8d 14 cd 00 00 00 00 0f 18 08 48 29 ca
>>>>> RIP  [<ffffffffa02b2c2c>] nf_conntrack_helper_unregister+0x16c/0x320
>>>>> [nf_conntrack]
>>>>>  RSP <ffff88013982fe68>
>>>>> CR2: 0000000000000038
>>>>> ---[ end trace bc3a0ede3d0084db ]---
>>>>>
>>>> I am currently traveling and wont be able to help you before next week.
>>>>
>>>> I added netdev, Patrick, and netfilter-devel in CC so that more eyes can take a look.
>>> Thanks for the report, I'll have a look at this. Zdenek, please
>>> send me the nf_conntrack.ko file used in the above oops. Thanks.
>>>
>>
>> Ok
>>
>> I've found the solution for my problem.
>>
>> http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/30483
>>
>> I've made this small fix from this thread:
>>
>> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core
>> index b5869b9..68488f8 100644
>> --- a/net/netfilter/nf_conntrack_core.c
>> +++ b/net/netfilter/nf_conntrack_core.c
>> @@ -1108,6 +1108,7 @@ static void nf_conntrack_cleanup_init_net(void)
>>  {
>>         nf_conntrack_helper_fini();
>>         nf_conntrack_proto_fini();
>> +       rcu_barrier();
>>         kmem_cache_destroy(nf_conntrack_cachep);
>>  }
>>
>> @@ -1266,7 +1267,7 @@ static int nf_conntrack_init_init_net(void)
>>
>>         nf_conntrack_cachep = kmem_cache_create("nf_conntrack",
>>                                                 sizeof(struct nf_conn),
>> -                                               0, SLAB_DESTROY_BY_RCU, NULL);
>> +                                               0, 0, NULL);
>>         if (!nf_conntrack_cachep) {
>>                 printk(KERN_ERR "Unable to create nf_conn slab cache\n");
>>                 ret = -ENOMEM;
>>
>>
>> As the thread nf_conntrack: Use rcu_barrier() and fix kmem_cache_create flags
>> seems to be samewhat 'unfinished'  and already a bit old and I've no
>> idea whether it actually fixes problem completely or just hides it in
>> my case - I'm leaving it to some RCU gurus to fix this issue.
>>
>> All I could say is - this this extra rcu_barrier() and removal of
>> SLAB_DESTROY removes my GPF on reboot.
>>
>> Zdenek
>
> Ouch..
>
> Dont think such a patch makes your kernel better, it'll crash too.
>
> You cannot remove SLAB_DESTROY_BY_RCU like this, it's there for very good reasons.
>

Well I'm not noticing any ill behavior - also note - rcu_barrier() is
there before the cache is destroyed.
But as I said - it's just my shot into the dark - which seems to work for me...

Zdenek
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney - Sept. 3, 2009, 6:17 p.m.
On Thu, Sep 03, 2009 at 12:17:43AM +0200, Eric Dumazet wrote:
> Zdenek Kabelac a écrit :
> > 2009/8/17 Patrick McHardy <kaber@trash.net>:
> >> Eric Dumazet wrote:
> >>> Zdenek Kabelac a écrit :
> >>>>  [<ffffffffa02c502f>] nf_conntrack_ftp_fini+0x2f/0x70 [nf_conntrack_ftp]
> >>>>  [<ffffffff8027bcc5>] sys_delete_module+0x1a5/0x270
> >>>>  [<ffffffff8020d329>] ? retint_swapgs+0xe/0x13
> >>>>  [<ffffffff80271bf2>] ? trace_hardirqs_on_caller+0x162/0x1b0
> >>>>  [<ffffffff80292121>] ? audit_syscall_entry+0x191/0x1c0
> >>>>  [<ffffffff80526dae>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> >>>>  [<ffffffff8020c84b>] system_call_fastpath+0x16/0x1b
> >>>> Code: c6 00 00 0f 82 66 ff ff ff 49 8b 9e d8 05 00 00 48 85 db 75 16
> >>>> e9 8e 00 00 00 0f 1f 44 00 00 48 85 c0 0f 84 80 00 00 00 48 89 c3 <0f>
> >>>> b6 4b 37 48 8b 03 48 8d 14 cd 00 00 00 00 0f 18 08 48 29 ca
> >>>> RIP  [<ffffffffa02b2c2c>] nf_conntrack_helper_unregister+0x16c/0x320
> >>>> [nf_conntrack]
> >>>>  RSP <ffff88013982fe68>
> >>>> CR2: 0000000000000038
> >>>> ---[ end trace bc3a0ede3d0084db ]---
> >>>>
> >>> I am currently traveling and wont be able to help you before next week.
> >>>
> >>> I added netdev, Patrick, and netfilter-devel in CC so that more eyes can take a look.
> >> Thanks for the report, I'll have a look at this. Zdenek, please
> >> send me the nf_conntrack.ko file used in the above oops. Thanks.
> >>
> > 
> > Ok
> > 
> > I've found the solution for my problem.
> > 
> > http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/30483
> > 
> > I've made this small fix from this thread:
> > 
> > diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core
> > index b5869b9..68488f8 100644
> > --- a/net/netfilter/nf_conntrack_core.c
> > +++ b/net/netfilter/nf_conntrack_core.c
> > @@ -1108,6 +1108,7 @@ static void nf_conntrack_cleanup_init_net(void)
> >  {
> >         nf_conntrack_helper_fini();
> >         nf_conntrack_proto_fini();
> > +       rcu_barrier();
> >         kmem_cache_destroy(nf_conntrack_cachep);
> >  }
> > 
> > @@ -1266,7 +1267,7 @@ static int nf_conntrack_init_init_net(void)
> > 
> >         nf_conntrack_cachep = kmem_cache_create("nf_conntrack",
> >                                                 sizeof(struct nf_conn),
> > -                                               0, SLAB_DESTROY_BY_RCU, NULL);
> > +                                               0, 0, NULL);
> >         if (!nf_conntrack_cachep) {
> >                 printk(KERN_ERR "Unable to create nf_conn slab cache\n");
> >                 ret = -ENOMEM;
> > 
> > 
> > As the thread nf_conntrack: Use rcu_barrier() and fix kmem_cache_create flags
> > seems to be samewhat 'unfinished'  and already a bit old and I've no
> > idea whether it actually fixes problem completely or just hides it in
> > my case - I'm leaving it to some RCU gurus to fix this issue.
> > 
> > All I could say is - this this extra rcu_barrier() and removal of
> > SLAB_DESTROY removes my GPF on reboot.
> > 
> > Zdenek
> 
> Ouch..
> 
> Dont think such a patch makes your kernel better, it'll crash too.
> 
> You cannot remove SLAB_DESTROY_BY_RCU like this, it's there for very good reasons.

And if I understand correctly, this is more evidence that
kmem_cache_destroy() needs to do an rcu_barrier() in the
SLAB_DESTROY_BY_RCU case.

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core
index b5869b9..68488f8 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1108,6 +1108,7 @@  static void nf_conntrack_cleanup_init_net(void)
 {
        nf_conntrack_helper_fini();
        nf_conntrack_proto_fini();
+       rcu_barrier();
        kmem_cache_destroy(nf_conntrack_cachep);
 }

@@ -1266,7 +1267,7 @@  static int nf_conntrack_init_init_net(void)

        nf_conntrack_cachep = kmem_cache_create("nf_conntrack",
                                                sizeof(struct nf_conn),
-                                               0, SLAB_DESTROY_BY_RCU, NULL);
+                                               0, 0, NULL);
        if (!nf_conntrack_cachep) {
                printk(KERN_ERR "Unable to create nf_conn slab cache\n");
                ret = -ENOMEM;