diff mbox

4.1.0, kernel panic, pppoe_release

Message ID 20150910155657.GC3661@alphalink.fr
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Guillaume Nault Sept. 10, 2015, 3:56 p.m. UTC
On Fri, Jul 17, 2015 at 09:16:14PM +0300, Denys Fedoryshchenko wrote:
> Probably my knowledge of kernel is not sufficient, but i will try few
> approaches.
> One of them to add to pppoe_unbind_sock_work:
> 
>         pppox_unbind_sock(sk);
>         +/* Signal the death of the socket. */
>         +sk->sk_state = PPPOX_DEAD;
>
I don't believe this will fix anything. pppox_unbind_sock() already
sets sk->sk_state when necessary.

> I will wait first, to make sure this patch was causing kernel panic (it
> needs 24h testing cycle), then i will try this fix.
> 
I suspect the problem goes with actions performed on the underlying
interface (MAC address, MTU or link state update). This triggers
pppoe_flush_dev(), which cleans up the device without announcing it
in sk->sk_state.

Can you pleas try the following patch?

---
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Denys Fedoryshchenko Sept. 22, 2015, 1:47 a.m. UTC | #1
Hi,
Sorry for late reply, was not able to push new kernel on pppoes without 
permissions (it's production servers), just got OK.

I am testing patch on another pppoe server with 9k users, for ~3 days, 
seems fine. I will test today
also on server that was experiencing crashes within 1 day.

On 2015-09-10 18:56, Guillaume Nault wrote:
> On Fri, Jul 17, 2015 at 09:16:14PM +0300, Denys Fedoryshchenko wrote:
>> Probably my knowledge of kernel is not sufficient, but i will try few
>> approaches.
>> One of them to add to pppoe_unbind_sock_work:
>> 
>>         pppox_unbind_sock(sk);
>>         +/* Signal the death of the socket. */
>>         +sk->sk_state = PPPOX_DEAD;
>> 
> I don't believe this will fix anything. pppox_unbind_sock() already
> sets sk->sk_state when necessary.
> 
>> I will wait first, to make sure this patch was causing kernel panic 
>> (it
>> needs 24h testing cycle), then i will try this fix.
>> 
> I suspect the problem goes with actions performed on the underlying
> interface (MAC address, MTU or link state update). This triggers
> pppoe_flush_dev(), which cleans up the device without announcing it
> in sk->sk_state.
> 
> Can you pleas try the following patch?
> 
> ---
> diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
> index 3837ae3..2ed7506 100644
> --- a/drivers/net/ppp/pppoe.c
> +++ b/drivers/net/ppp/pppoe.c
> @@ -313,7 +313,6 @@ static void pppoe_flush_dev(struct net_device *dev)
>  			if (po->pppoe_dev == dev &&
>  			    sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND | PPPOX_ZOMBIE)) 
> {
>  				pppox_unbind_sock(sk);
> -				sk->sk_state = PPPOX_ZOMBIE;
>  				sk->sk_state_change(sk);
>  				po->pppoe_dev = NULL;
>  				dev_put(dev);
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Guillaume Nault Sept. 25, 2015, 2:38 p.m. UTC | #2
On Tue, Sep 22, 2015 at 04:47:48AM +0300, Denys Fedoryshchenko wrote:
> Hi,
> Sorry for late reply, was not able to push new kernel on pppoes without
> permissions (it's production servers), just got OK.
> 
> I am testing patch on another pppoe server with 9k users, for ~3 days, seems
> fine. I will test today
> also on server that was experiencing crashes within 1 day.
>
Thanks for the feedback. I'm about to submit a fix. Should I add a
Tested-by tag for you?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Denys Fedoryshchenko Sept. 25, 2015, 3:02 p.m. UTC | #3
On 2015-09-25 17:38, Guillaume Nault wrote:
> On Tue, Sep 22, 2015 at 04:47:48AM +0300, Denys Fedoryshchenko wrote:
>> Hi,
>> Sorry for late reply, was not able to push new kernel on pppoes 
>> without
>> permissions (it's production servers), just got OK.
>> 
>> I am testing patch on another pppoe server with 9k users, for ~3 days, 
>> seems
>> fine. I will test today
>> also on server that was experiencing crashes within 1 day.
>> 
> Thanks for the feedback. I'm about to submit a fix. Should I add a
> Tested-by tag for you?
On one of servers i got same crash as before, within hours. 9k users 
server also crashed after while, so it seems it doesn't help.
I will do some more tests tomorrow.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Guillaume Nault Sept. 25, 2015, 7:02 p.m. UTC | #4
On Fri, Sep 25, 2015 at 06:02:42PM +0300, Denys Fedoryshchenko wrote:
> On 2015-09-25 17:38, Guillaume Nault wrote:
> >On Tue, Sep 22, 2015 at 04:47:48AM +0300, Denys Fedoryshchenko wrote:
> >>Hi,
> >>Sorry for late reply, was not able to push new kernel on pppoes without
> >>permissions (it's production servers), just got OK.
> >>
> >>I am testing patch on another pppoe server with 9k users, for ~3 days,
> >>seems
> >>fine. I will test today
> >>also on server that was experiencing crashes within 1 day.
> >>
> >Thanks for the feedback. I'm about to submit a fix. Should I add a
> >Tested-by tag for you?
> On one of servers i got same crash as before, within hours. 9k users server
> also crashed after while, so it seems it doesn't help.
> I will do some more tests tomorrow.
Ok, this must be a different bug then. Do you have a trace of a crash
with the patched kernel?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index 3837ae3..2ed7506 100644
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -313,7 +313,6 @@  static void pppoe_flush_dev(struct net_device *dev)
 			if (po->pppoe_dev == dev &&
 			    sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND | PPPOX_ZOMBIE)) {
 				pppox_unbind_sock(sk);
-				sk->sk_state = PPPOX_ZOMBIE;
 				sk->sk_state_change(sk);
 				po->pppoe_dev = NULL;
 				dev_put(dev);