diff mbox

nf_nat_pptp 4.12.3 kernel lockup/reboot

Message ID 20170724161944.GB23964@breakpoint.cc
State Awaiting Upstream, archived
Delegated to: David Miller
Headers show

Commit Message

Florian Westphal July 24, 2017, 4:19 p.m. UTC
Denys Fedoryshchenko <nuclearcat@nuclearcat.com> wrote:
> Hi,
> 
> I am trying to upgrade kernel 4.11.8 to 4.12.3 (it is a nat/router, handling
> approx 2gbps of pppoe users traffic) and noticed that after while server
> rebooting(i have set reboot on panic and etc).
> I can't run serial console, and in pstore / netconsole there is nothing.
> Best i got is some very short message about softlockup in ipmi, but as
> storage very limited there - it is near useless.
> 
> By preliminary testing (can't do it much, as it's production) - it seems
> following lines causing issue, they worked in 4.11.8 and no more in 4.12.3.

Wild guess here, does this help?

Comments

Florian Westphal July 24, 2017, 4:20 p.m. UTC | #1
Florian Westphal <fw@strlen.de> wrote:
> Denys Fedoryshchenko <nuclearcat@nuclearcat.com> wrote:
> > Hi,
> > 
> > I am trying to upgrade kernel 4.11.8 to 4.12.3 (it is a nat/router, handling
> > approx 2gbps of pppoe users traffic) and noticed that after while server
> > rebooting(i have set reboot on panic and etc).
> > I can't run serial console, and in pstore / netconsole there is nothing.
> > Best i got is some very short message about softlockup in ipmi, but as
> > storage very limited there - it is near useless.
> > 
> > By preliminary testing (can't do it much, as it's production) - it seems
> > following lines causing issue, they worked in 4.11.8 and no more in 4.12.3.
> 
> Wild guess here, does this help?
> 
> diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c
> --- a/net/netfilter/nf_conntrack_helper.c
> +++ b/net/netfilter/nf_conntrack_helper.c
> @@ -266,6 +266,8 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl,
>                 help = nf_ct_helper_ext_add(ct, helper, flags);
>                 if (help == NULL)
>                         return -ENOMEM;
> +              	if (!nf_ct_ext_add(ct, NF_CT_EXT_NAT, flags));

sigh, stupid typo, should be no ';' at the end above.
Denys Fedoryshchenko July 25, 2017, 7:27 a.m. UTC | #2
On 2017-07-24 19:20, Florian Westphal wrote:
> Florian Westphal <fw@strlen.de> wrote:
>> Denys Fedoryshchenko <nuclearcat@nuclearcat.com> wrote:
>> > Hi,
>> >
>> > I am trying to upgrade kernel 4.11.8 to 4.12.3 (it is a nat/router, handling
>> > approx 2gbps of pppoe users traffic) and noticed that after while server
>> > rebooting(i have set reboot on panic and etc).
>> > I can't run serial console, and in pstore / netconsole there is nothing.
>> > Best i got is some very short message about softlockup in ipmi, but as
>> > storage very limited there - it is near useless.
>> >
>> > By preliminary testing (can't do it much, as it's production) - it seems
>> > following lines causing issue, they worked in 4.11.8 and no more in 4.12.3.
>> 
>> Wild guess here, does this help?
>> 
>> diff --git a/net/netfilter/nf_conntrack_helper.c 
>> b/net/netfilter/nf_conntrack_helper.c
>> --- a/net/netfilter/nf_conntrack_helper.c
>> +++ b/net/netfilter/nf_conntrack_helper.c
>> @@ -266,6 +266,8 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, 
>> struct nf_conn *tmpl,
>>                 help = nf_ct_helper_ext_add(ct, helper, flags);
>>                 if (help == NULL)
>>                         return -ENOMEM;
>> +              	if (!nf_ct_ext_add(ct, NF_CT_EXT_NAT, flags));
> 
> sigh, stupid typo, should be no ';' at the end above.

Tested, it looks like not hanging anymore (before it was hanging within 
10 minutes)
Probably i will wait 24h testing cycle.
Denys Fedoryshchenko July 27, 2017, 6:29 a.m. UTC | #3
On 2017-07-24 19:20, Florian Westphal wrote:
> Florian Westphal <fw@strlen.de> wrote:
>> Denys Fedoryshchenko <nuclearcat@nuclearcat.com> wrote:
>> > Hi,
>> >
>> > I am trying to upgrade kernel 4.11.8 to 4.12.3 (it is a nat/router, handling
>> > approx 2gbps of pppoe users traffic) and noticed that after while server
>> > rebooting(i have set reboot on panic and etc).
>> > I can't run serial console, and in pstore / netconsole there is nothing.
>> > Best i got is some very short message about softlockup in ipmi, but as
>> > storage very limited there - it is near useless.
>> >
>> > By preliminary testing (can't do it much, as it's production) - it seems
>> > following lines causing issue, they worked in 4.11.8 and no more in 4.12.3.
>> 
>> Wild guess here, does this help?
>> 
>> diff --git a/net/netfilter/nf_conntrack_helper.c 
>> b/net/netfilter/nf_conntrack_helper.c
>> --- a/net/netfilter/nf_conntrack_helper.c
>> +++ b/net/netfilter/nf_conntrack_helper.c
>> @@ -266,6 +266,8 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, 
>> struct nf_conn *tmpl,
>>                 help = nf_ct_helper_ext_add(ct, helper, flags);
>>                 if (help == NULL)
>>                         return -ENOMEM;
>> +              	if (!nf_ct_ext_add(ct, NF_CT_EXT_NAT, flags));
> 
> sigh, stupid typo, should be no ';' at the end above.

Tested-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>

Tested and no more hangs for 2 days, definitely improvement.
Any chance it will go to stable 4.12.x and new kernel?

Thank you very much!
Denys Fedoryshchenko Aug. 25, 2017, 2:58 a.m. UTC | #4
On 2017-07-24 19:20, Florian Westphal wrote:
> Florian Westphal <fw@strlen.de> wrote:
>> Denys Fedoryshchenko <nuclearcat@nuclearcat.com> wrote:
>> > Hi,
>> >
>> > I am trying to upgrade kernel 4.11.8 to 4.12.3 (it is a nat/router, handling
>> > approx 2gbps of pppoe users traffic) and noticed that after while server
>> > rebooting(i have set reboot on panic and etc).
>> > I can't run serial console, and in pstore / netconsole there is nothing.
>> > Best i got is some very short message about softlockup in ipmi, but as
>> > storage very limited there - it is near useless.
>> >
>> > By preliminary testing (can't do it much, as it's production) - it seems
>> > following lines causing issue, they worked in 4.11.8 and no more in 4.12.3.
>> 
>> Wild guess here, does this help?
>> 
>> diff --git a/net/netfilter/nf_conntrack_helper.c 
>> b/net/netfilter/nf_conntrack_helper.c
>> --- a/net/netfilter/nf_conntrack_helper.c
>> +++ b/net/netfilter/nf_conntrack_helper.c
>> @@ -266,6 +266,8 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct, 
>> struct nf_conn *tmpl,
>>                 help = nf_ct_helper_ext_add(ct, helper, flags);
>>                 if (help == NULL)
>>                         return -ENOMEM;
>> +              	if (!nf_ct_ext_add(ct, NF_CT_EXT_NAT, flags));
> 
> sigh, stupid typo, should be no ';' at the end above.
Sorry, is there any plans to push this to 4.12 stable queue?
Florian Westphal Aug. 25, 2017, 5:21 a.m. UTC | #5
Denys Fedoryshchenko <nuclearcat@nuclearcat.com> wrote:
> >>> I am trying to upgrade kernel 4.11.8 to 4.12.3 (it is a nat/router, handling
> >>> approx 2gbps of pppoe users traffic) and noticed that after while server
> >>> rebooting(i have set reboot on panic and etc).
> >>> I can't run serial console, and in pstore / netconsole there is nothing.
> >>> Best i got is some very short message about softlockup in ipmi, but as
> >>> storage very limited there - it is near useless.
> >>>
> >>> By preliminary testing (can't do it much, as it's production) - it seems
> >>> following lines causing issue, they worked in 4.11.8 and no more in 4.12.3.
> >>
> >>Wild guess here, does this help?
> >>
> >>diff --git a/net/netfilter/nf_conntrack_helper.c
> >>b/net/netfilter/nf_conntrack_helper.c
> >>--- a/net/netfilter/nf_conntrack_helper.c
> >>+++ b/net/netfilter/nf_conntrack_helper.c
> >>@@ -266,6 +266,8 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct,
> >>struct nf_conn *tmpl,
> >>                help = nf_ct_helper_ext_add(ct, helper, flags);
> >>                if (help == NULL)
> >>                        return -ENOMEM;
> >>+              	if (!nf_ct_ext_add(ct, NF_CT_EXT_NAT, flags));
> >
> >sigh, stupid typo, should be no ';' at the end above.
> Sorry, is there any plans to push this to 4.12 stable queue?

No, sorry, this patch adds the extension for all connections
that use a helper, but the nat extension is only used/required by pptp
helper (and masquerade).

Thing is that this patch should not be needed, I will have
to review pptp again, maybe i missed a case where the extension is not
added.

Do you happen to have an oops backtrace?

That might speed this up a bit.
Denys Fedoryshchenko Aug. 25, 2017, 7:15 a.m. UTC | #6
On 2017-08-25 08:21, Florian Westphal wrote:
> Denys Fedoryshchenko <nuclearcat@nuclearcat.com> wrote:
>> >>> I am trying to upgrade kernel 4.11.8 to 4.12.3 (it is a nat/router, handling
>> >>> approx 2gbps of pppoe users traffic) and noticed that after while server
>> >>> rebooting(i have set reboot on panic and etc).
>> >>> I can't run serial console, and in pstore / netconsole there is nothing.
>> >>> Best i got is some very short message about softlockup in ipmi, but as
>> >>> storage very limited there - it is near useless.
>> >>>
>> >>> By preliminary testing (can't do it much, as it's production) - it seems
>> >>> following lines causing issue, they worked in 4.11.8 and no more in 4.12.3.
>> >>
>> >>Wild guess here, does this help?
>> >>
>> >>diff --git a/net/netfilter/nf_conntrack_helper.c
>> >>b/net/netfilter/nf_conntrack_helper.c
>> >>--- a/net/netfilter/nf_conntrack_helper.c
>> >>+++ b/net/netfilter/nf_conntrack_helper.c
>> >>@@ -266,6 +266,8 @@ int __nf_ct_try_assign_helper(struct nf_conn *ct,
>> >>struct nf_conn *tmpl,
>> >>                help = nf_ct_helper_ext_add(ct, helper, flags);
>> >>                if (help == NULL)
>> >>                        return -ENOMEM;
>> >>+              	if (!nf_ct_ext_add(ct, NF_CT_EXT_NAT, flags));
>> >
>> >sigh, stupid typo, should be no ';' at the end above.
>> Sorry, is there any plans to push this to 4.12 stable queue?
> 
> No, sorry, this patch adds the extension for all connections
> that use a helper, but the nat extension is only used/required by pptp
> helper (and masquerade).
> 
> Thing is that this patch should not be needed, I will have
> to review pptp again, maybe i missed a case where the extension is not
> added.
> 
> Do you happen to have an oops backtrace?
> 
> That might speed this up a bit.
There is nothing in netconsole, and also nothing ERST pstore, i found 
reason just by guessing.
Its totally headless also (no screen, no serial console).
I can try to attach USB serial for serial console, but not sure it will 
help.
If there is any other way to catch - i can try it, but as it's 
production server, so i can't "crash it" more than once per day.
diff mbox

Patch

diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c
--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -266,6 +266,8 @@  int __nf_ct_try_assign_helper(struct nf_conn *ct, struct nf_conn *tmpl,
                help = nf_ct_helper_ext_add(ct, helper, flags);
                if (help == NULL)
                        return -ENOMEM;
+              	if (!nf_ct_ext_add(ct, NF_CT_EXT_NAT, flags));
+                       return -ENOMEM;
        } else {
                /* We only allow helper re-assignment of the same sort since
                 * we cannot reallocate the helper extension area.