diff mbox

[net] ppp: don't override sk->sk_state in pppoe_flush_dev()

Message ID b7fbf103cd589741e3938550e7cf0f3684d8951c.1443605079.git.g.nault@alphalink.fr
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Guillaume Nault Sept. 30, 2015, 9:45 a.m. UTC
Since commit 2b018d57ff18 ("pppoe: drop PPPOX_ZOMBIEs in pppoe_release"),
pppoe_release() calls dev_put(po->pppoe_dev) if sk is in the
PPPOX_ZOMBIE state. But pppoe_flush_dev() can set sk->sk_state to
PPPOX_ZOMBIE _and_ reset po->pppoe_dev to NULL. This leads to the
following oops:

[  570.140800] BUG: unable to handle kernel NULL pointer dereference at 00000000000004e0
[  570.142931] IP: [<ffffffffa018c701>] pppoe_release+0x50/0x101 [pppoe]
[  570.144601] PGD 3d119067 PUD 3dbc1067 PMD 0
[  570.144601] Oops: 0000 [#1] SMP
[  570.144601] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pppoe pppox ppp_generic slhc loop crc32c_intel ghash_clmulni_intel jitterentropy_rng sha256_generic hmac drbg ansi_cprng aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper acpi_cpufreq evdev serio_raw processor button ext4 crc16 mbcache jbd2 virtio_net virtio_blk virtio_pci virtio_ring virtio
[  570.144601] CPU: 1 PID: 15738 Comm: ppp-apitest Not tainted 4.2.0 #1
[  570.144601] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
[  570.144601] task: ffff88003d30d600 ti: ffff880036b60000 task.ti: ffff880036b60000
[  570.144601] RIP: 0010:[<ffffffffa018c701>]  [<ffffffffa018c701>] pppoe_release+0x50/0x101 [pppoe]
[  570.144601] RSP: 0018:ffff880036b63e08  EFLAGS: 00010202
[  570.144601] RAX: 0000000000000000 RBX: ffff880034340000 RCX: 0000000000000206
[  570.144601] RDX: 0000000000000006 RSI: ffff88003d30dd20 RDI: ffff88003d30dd20
[  570.144601] RBP: ffff880036b63e28 R08: 0000000000000001 R09: 0000000000000000
[  570.144601] R10: 00007ffee9b50420 R11: ffff880034340078 R12: ffff8800387ec780
[  570.144601] R13: ffff8800387ec7b0 R14: ffff88003e222aa0 R15: ffff8800387ec7b0
[  570.144601] FS:  00007f5672f48700(0000) GS:ffff88003fc80000(0000) knlGS:0000000000000000
[  570.144601] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  570.144601] CR2: 00000000000004e0 CR3: 0000000037f7e000 CR4: 00000000000406a0
[  570.144601] Stack:
[  570.144601]  ffffffffa018f240 ffff8800387ec780 ffffffffa018f240 ffff8800387ec7b0
[  570.144601]  ffff880036b63e48 ffffffff812caabe ffff880039e4e000 0000000000000008
[  570.144601]  ffff880036b63e58 ffffffff812cabad ffff880036b63ea8 ffffffff811347f5
[  570.144601] Call Trace:
[  570.144601]  [<ffffffff812caabe>] sock_release+0x1a/0x75
[  570.144601]  [<ffffffff812cabad>] sock_close+0xd/0x11
[  570.144601]  [<ffffffff811347f5>] __fput+0xff/0x1a5
[  570.144601]  [<ffffffff811348cb>] ____fput+0x9/0xb
[  570.144601]  [<ffffffff81056682>] task_work_run+0x66/0x90
[  570.144601]  [<ffffffff8100189e>] prepare_exit_to_usermode+0x8c/0xa7
[  570.144601]  [<ffffffff81001a26>] syscall_return_slowpath+0x16d/0x19b
[  570.144601]  [<ffffffff813babb1>] int_ret_from_sys_call+0x25/0x9f
[  570.144601] Code: 48 8b 83 c8 01 00 00 a8 01 74 12 48 89 df e8 8b 27 14 e1 b8 f7 ff ff ff e9 b7 00 00 00 8a 43 12 a8 0b 74 1c 48 8b 83 a8 04 00 00 <48> 8b 80 e0 04 00 00 65 ff 08 48 c7 83 a8 04 00 00 00 00 00 00
[  570.144601] RIP  [<ffffffffa018c701>] pppoe_release+0x50/0x101 [pppoe]
[  570.144601]  RSP <ffff880036b63e08>
[  570.144601] CR2: 00000000000004e0
[  570.200518] ---[ end trace 46956baf17349563 ]---

pppoe_flush_dev() has no reason to override sk->sk_state with
PPPOX_ZOMBIE. pppox_unbind_sock() already sets sk->sk_state to
PPPOX_DEAD, which is the correct state given that sk is unbound and
po->pppoe_dev is NULL.

Fixes: 2b018d57ff18 ("pppoe: drop PPPOX_ZOMBIEs in pppoe_release")
Tested-by: Oleksii Berezhniak <core@irc.lg.ua>
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
---
 drivers/net/ppp/pppoe.c | 1 -
 1 file changed, 1 deletion(-)

Comments

Denys Fedoryshchenko Oct. 2, 2015, 8:01 a.m. UTC | #1
Here is similar panic after patch applied (it might be different bug), 
got over netconsole:

  [126348.610996] BUG: unable to handle kernel
  NULL pointer dereference
  at 0000000000000428
  [126348.611656] IP:
  [<ffffffffa00ea129>] pppoe_release+0x56/0x142 [pppoe]
  [126348.612033] PGD 17d0b03067
  PUD 17c721b067
  PMD 0

  [126348.612545] Oops: 0000 [#1]
  SMP

  [126348.612981] Modules linked in:
  act_skbedit
  sch_fq
  cls_fw
  act_police
  cls_u32
  sch_ingress
  sch_sfq
  sch_htb
  pppoe
  pppox
  ppp_generic
  slhc
  netconsole
  configfs
  xt_nat
  ts_bm
  xt_string
  xt_connmark
  xt_TCPMSS
  xt_tcpudp
  xt_mark
  iptable_filter
  iptable_nat
  nf_conntrack_ipv4
  nf_defrag_ipv4
  nf_nat_ipv4
  nf_nat
  nf_conntrack
  iptable_mangle
  ip_tables
  x_tables
  8021q
  garp
  mrp
  stp
  llc
  bonding

  [126348.617115] CPU: 0 PID: 5254 Comm: accel-pppd Not tainted 
4.2.2-build-0087 #2
  [126348.617632] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS 
SE5C600.86B.02.03.0003.041920141333 04/19/2014
  [126348.618193] task: ffff8817cfbe0000 ti: ffff8817c6350000 task.ti: 
ffff8817c6350000
  [126348.618696] RIP: 0010:[<ffffffffa00ea129>]
  [<ffffffffa00ea129>] pppoe_release+0x56/0x142 [pppoe]
  [126348.619306] RSP: 0018:ffff8817c6353e28  EFLAGS: 00010202
  [126348.619601] RAX: 0000000000000000 RBX: ffff8817a92b0400 RCX: 
0000000000000000
  [126348.620152] RDX: 0000000000000001 RSI: 00000000fffffe01 RDI: 
ffffffff8180c18a
  [126348.620715] RBP: ffff8817c6353e68 R08: 0000000000000000 R09: 
0000000000000000
  [126348.621254] R10: ffff88173c02b210 R11: 0000000000000293 R12: 
ffff8817b3c18000
  [126348.621784] R13: ffff8817b3c18030 R14: ffff8817967f1140 R15: 
ffff8817d226c920
  [126348.622330] FS:  00007f9444db9700(0000) GS:ffff8817dee00000(0000) 
knlGS:0000000000000000
  [126348.622876] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [126348.623202] CR2: 0000000000000428 CR3: 00000017c70b2000 CR4: 
00000000001406f0
  [126348.623760] Stack:
  [126348.624056]  0000000100200018
  0000000000000000
  0000000100000000
  ffff8817b3c18000

  [126348.624925]  ffffffffa00ec280
  ffff8817b3c18030
  ffff8817967f1140
  ffff8817d226c920

  [126348.625736]  ffff8817c6353e88
  ffffffff8180820a
  ffff88173c02b200
  0000000000000008

  [126348.626533] Call Trace:
  [126348.626873]  [<ffffffff8180820a>] sock_release+0x1a/0x70
  [126348.627183]  [<ffffffff8180826d>] sock_close+0xd/0x11
  [126348.627512]  [<ffffffff81152c61>] __fput+0xdf/0x193
  [126348.627845]  [<ffffffff81152d43>] ____fput+0x9/0xb
  [126348.628169]  [<ffffffff810d098e>] task_work_run+0x78/0x8f
  [126348.628517]  [<ffffffff810038a9>] do_notify_resume+0x40/0x4e
  [126348.628837]  [<ffffffff818a5a0a>] int_signal+0x12/0x17
  [126348.629131] Code:
  48
  8b
  83
  e0
  00
  00
  00
  a8
  01
  74
  12
  48
  89
  df
  e8
  0d
  24
  72
  e1
  b8
  f7
  ff
  ff
  ff
  e9
  eb
  00
  00
  00
  8a
  43
  12
  a8
  0b
  74
  1c
  48
  8b
  83
  a0
  02
  00
  00

  8b
  80
  28
  04
  00
  00
  65
  ff
  08
  48
  c7
  83
  a0
  02
  00
  00
  00
  00
  00
  00

  [126348.635060] RIP
  [<ffffffffa00ea129>] pppoe_release+0x56/0x142 [pppoe]
  [126348.635432]  RSP <ffff8817c6353e28>
  [126348.635718] CR2: 0000000000000428
  [126348.641165] ---[ end trace 911ff90a1416e3d1 ]---
  [126348.653235] Kernel panic - not syncing: Fatal exception
  [126348.653538] Kernel Offset: disabled
  [126348.677177] Rebooting in 5 seconds..




On 2015-09-30 12:45, Guillaume Nault wrote:
> Since commit 2b018d57ff18 ("pppoe: drop PPPOX_ZOMBIEs in 
> pppoe_release"),
> pppoe_release() calls dev_put(po->pppoe_dev) if sk is in the
> PPPOX_ZOMBIE state. But pppoe_flush_dev() can set sk->sk_state to
> PPPOX_ZOMBIE _and_ reset po->pppoe_dev to NULL. This leads to the
> following oops:
> 
> [  570.140800] BUG: unable to handle kernel NULL pointer dereference
> at 00000000000004e0
> [  570.142931] IP: [<ffffffffa018c701>] pppoe_release+0x50/0x101 
> [pppoe]
> [  570.144601] PGD 3d119067 PUD 3dbc1067 PMD 0
> [  570.144601] Oops: 0000 [#1] SMP
> [  570.144601] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core
> ip6_udp_tunnel udp_tunnel pppoe pppox ppp_generic slhc loop
> crc32c_intel ghash_clmulni_intel jitterentropy_rng sha256_generic hmac
> drbg ansi_cprng aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul
> glue_helper acpi_cpufreq evdev serio_raw processor button ext4 crc16
> mbcache jbd2 virtio_net virtio_blk virtio_pci virtio_ring virtio
> [  570.144601] CPU: 1 PID: 15738 Comm: ppp-apitest Not tainted 4.2.0 #1
> [  570.144601] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS Debian-1.8.2-1 04/01/2014
> [  570.144601] task: ffff88003d30d600 ti: ffff880036b60000 task.ti:
> ffff880036b60000
> [  570.144601] RIP: 0010:[<ffffffffa018c701>]  [<ffffffffa018c701>]
> pppoe_release+0x50/0x101 [pppoe]
> [  570.144601] RSP: 0018:ffff880036b63e08  EFLAGS: 00010202
> [  570.144601] RAX: 0000000000000000 RBX: ffff880034340000 RCX: 
> 0000000000000206
> [  570.144601] RDX: 0000000000000006 RSI: ffff88003d30dd20 RDI: 
> ffff88003d30dd20
> [  570.144601] RBP: ffff880036b63e28 R08: 0000000000000001 R09: 
> 0000000000000000
> [  570.144601] R10: 00007ffee9b50420 R11: ffff880034340078 R12: 
> ffff8800387ec780
> [  570.144601] R13: ffff8800387ec7b0 R14: ffff88003e222aa0 R15: 
> ffff8800387ec7b0
> [  570.144601] FS:  00007f5672f48700(0000) GS:ffff88003fc80000(0000)
> knlGS:0000000000000000
> [  570.144601] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  570.144601] CR2: 00000000000004e0 CR3: 0000000037f7e000 CR4: 
> 00000000000406a0
> [  570.144601] Stack:
> [  570.144601]  ffffffffa018f240 ffff8800387ec780 ffffffffa018f240
> ffff8800387ec7b0
> [  570.144601]  ffff880036b63e48 ffffffff812caabe ffff880039e4e000
> 0000000000000008
> [  570.144601]  ffff880036b63e58 ffffffff812cabad ffff880036b63ea8
> ffffffff811347f5
> [  570.144601] Call Trace:
> [  570.144601]  [<ffffffff812caabe>] sock_release+0x1a/0x75
> [  570.144601]  [<ffffffff812cabad>] sock_close+0xd/0x11
> [  570.144601]  [<ffffffff811347f5>] __fput+0xff/0x1a5
> [  570.144601]  [<ffffffff811348cb>] ____fput+0x9/0xb
> [  570.144601]  [<ffffffff81056682>] task_work_run+0x66/0x90
> [  570.144601]  [<ffffffff8100189e>] prepare_exit_to_usermode+0x8c/0xa7
> [  570.144601]  [<ffffffff81001a26>] 
> syscall_return_slowpath+0x16d/0x19b
> [  570.144601]  [<ffffffff813babb1>] int_ret_from_sys_call+0x25/0x9f
> [  570.144601] Code: 48 8b 83 c8 01 00 00 a8 01 74 12 48 89 df e8 8b
> 27 14 e1 b8 f7 ff ff ff e9 b7 00 00 00 8a 43 12 a8 0b 74 1c 48 8b 83
> a8 04 00 00 <48> 8b 80 e0 04 00 00 65 ff 08 48 c7 83 a8 04 00 00 00 00
> 00 00
> [  570.144601] RIP  [<ffffffffa018c701>] pppoe_release+0x50/0x101 
> [pppoe]
> [  570.144601]  RSP <ffff880036b63e08>
> [  570.144601] CR2: 00000000000004e0
> [  570.200518] ---[ end trace 46956baf17349563 ]---
> 
> pppoe_flush_dev() has no reason to override sk->sk_state with
> PPPOX_ZOMBIE. pppox_unbind_sock() already sets sk->sk_state to
> PPPOX_DEAD, which is the correct state given that sk is unbound and
> po->pppoe_dev is NULL.
> 
> Fixes: 2b018d57ff18 ("pppoe: drop PPPOX_ZOMBIEs in pppoe_release")
> Tested-by: Oleksii Berezhniak <core@irc.lg.ua>
> Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
> ---
>  drivers/net/ppp/pppoe.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
> index 3837ae3..2ed7506 100644
> --- a/drivers/net/ppp/pppoe.c
> +++ b/drivers/net/ppp/pppoe.c
> @@ -313,7 +313,6 @@ static void pppoe_flush_dev(struct net_device *dev)
>  			if (po->pppoe_dev == dev &&
>  			    sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND | PPPOX_ZOMBIE)) 
> {
>  				pppox_unbind_sock(sk);
> -				sk->sk_state = PPPOX_ZOMBIE;
>  				sk->sk_state_change(sk);
>  				po->pppoe_dev = NULL;
>  				dev_put(dev);
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Guillaume Nault Oct. 2, 2015, 5:54 p.m. UTC | #2
On Fri, Oct 02, 2015 at 11:01:45AM +0300, Denys Fedoryshchenko wrote:
> Here is similar panic after patch applied (it might be different bug), got
> over netconsole:
> 
>  [126348.617115] CPU: 0 PID: 5254 Comm: accel-pppd Not tainted
> 4.2.2-build-0087 #2
>  [126348.617632] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS
> SE5C600.86B.02.03.0003.041920141333 04/19/2014
>  [126348.618193] task: ffff8817cfbe0000 ti: ffff8817c6350000 task.ti:
> ffff8817c6350000
>  [126348.618696] RIP: 0010:[<ffffffffa00ea129>]
>  [<ffffffffa00ea129>] pppoe_release+0x56/0x142 [pppoe]
>  [126348.619306] RSP: 0018:ffff8817c6353e28  EFLAGS: 00010202
>  [126348.619601] RAX: 0000000000000000 RBX: ffff8817a92b0400 RCX:
> 0000000000000000
>  [126348.620152] RDX: 0000000000000001 RSI: 00000000fffffe01 RDI:
> ffffffff8180c18a
>  [126348.620715] RBP: ffff8817c6353e68 R08: 0000000000000000 R09:
> 0000000000000000
>  [126348.621254] R10: ffff88173c02b210 R11: 0000000000000293 R12:
> ffff8817b3c18000
>  [126348.621784] R13: ffff8817b3c18030 R14: ffff8817967f1140 R15:
> ffff8817d226c920
>  [126348.622330] FS:  00007f9444db9700(0000) GS:ffff8817dee00000(0000)
> knlGS:0000000000000000
>  [126348.622876] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  [126348.623202] CR2: 0000000000000428 CR3: 00000017c70b2000 CR4:
> 00000000001406f0
>  [126348.623760] Stack:
>  [126348.624056]  0000000100200018
>  0000000000000000
>  0000000100000000
>  ffff8817b3c18000
> 
>  [126348.624925]  ffffffffa00ec280
>  ffff8817b3c18030
>  ffff8817967f1140
>  ffff8817d226c920
> 
>  [126348.625736]  ffff8817c6353e88
>  ffffffff8180820a
>  ffff88173c02b200
>  0000000000000008
> 
>  [126348.626533] Call Trace:
>  [126348.626873]  [<ffffffff8180820a>] sock_release+0x1a/0x70
>  [126348.627183]  [<ffffffff8180826d>] sock_close+0xd/0x11
>  [126348.627512]  [<ffffffff81152c61>] __fput+0xdf/0x193
>  [126348.627845]  [<ffffffff81152d43>] ____fput+0x9/0xb
>  [126348.628169]  [<ffffffff810d098e>] task_work_run+0x78/0x8f
>  [126348.628517]  [<ffffffff810038a9>] do_notify_resume+0x40/0x4e
>  [126348.628837]  [<ffffffff818a5a0a>] int_signal+0x12/0x17

Ok, so there's another possibility for pppoe_release() to be called while
sk->sk_state is PPPOX_{CONNECTED,BOUND,ZOMBIE} but po->pppoe_dev is NULL.

I'll check the code to see if I can find any race wrt. po->pppoe_dev
and sk->sk_state settings.

In a previous message, you said you'd try reverting 287f3a943fef
("pppoe: Use workqueue to die properly when a PADT is received") and
related patches. I guess "related patches" means 665a6cd809f4 ("pppoe:
drop pppoe device in pppoe_unbind_sock_work"), right?.
Did these reverts give any successful result?

BTW, please don't top-post.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Denys Fedoryshchenko Oct. 4, 2015, 4:08 p.m. UTC | #3
On 2015-10-02 20:54, Guillaume Nault wrote:
> On Fri, Oct 02, 2015 at 11:01:45AM +0300, Denys Fedoryshchenko wrote:
>> Here is similar panic after patch applied (it might be different bug), 
>> got
>> over netconsole:
>> 
>>  [126348.617115] CPU: 0 PID: 5254 Comm: accel-pppd Not tainted
>> 4.2.2-build-0087 #2
>>  [126348.617632] Hardware name: Intel Corporation S2600GZ/S2600GZ, 
>> BIOS
>> SE5C600.86B.02.03.0003.041920141333 04/19/2014
>>  [126348.618193] task: ffff8817cfbe0000 ti: ffff8817c6350000 task.ti:
>> ffff8817c6350000
>>  [126348.618696] RIP: 0010:[<ffffffffa00ea129>]
>>  [<ffffffffa00ea129>] pppoe_release+0x56/0x142 [pppoe]
>>  [126348.619306] RSP: 0018:ffff8817c6353e28  EFLAGS: 00010202
>>  [126348.619601] RAX: 0000000000000000 RBX: ffff8817a92b0400 RCX:
>> 0000000000000000
>>  [126348.620152] RDX: 0000000000000001 RSI: 00000000fffffe01 RDI:
>> ffffffff8180c18a
>>  [126348.620715] RBP: ffff8817c6353e68 R08: 0000000000000000 R09:
>> 0000000000000000
>>  [126348.621254] R10: ffff88173c02b210 R11: 0000000000000293 R12:
>> ffff8817b3c18000
>>  [126348.621784] R13: ffff8817b3c18030 R14: ffff8817967f1140 R15:
>> ffff8817d226c920
>>  [126348.622330] FS:  00007f9444db9700(0000) GS:ffff8817dee00000(0000)
>> knlGS:0000000000000000
>>  [126348.622876] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>  [126348.623202] CR2: 0000000000000428 CR3: 00000017c70b2000 CR4:
>> 00000000001406f0
>>  [126348.623760] Stack:
>>  [126348.624056]  0000000100200018
>>  0000000000000000
>>  0000000100000000
>>  ffff8817b3c18000
>> 
>>  [126348.624925]  ffffffffa00ec280
>>  ffff8817b3c18030
>>  ffff8817967f1140
>>  ffff8817d226c920
>> 
>>  [126348.625736]  ffff8817c6353e88
>>  ffffffff8180820a
>>  ffff88173c02b200
>>  0000000000000008
>> 
>>  [126348.626533] Call Trace:
>>  [126348.626873]  [<ffffffff8180820a>] sock_release+0x1a/0x70
>>  [126348.627183]  [<ffffffff8180826d>] sock_close+0xd/0x11
>>  [126348.627512]  [<ffffffff81152c61>] __fput+0xdf/0x193
>>  [126348.627845]  [<ffffffff81152d43>] ____fput+0x9/0xb
>>  [126348.628169]  [<ffffffff810d098e>] task_work_run+0x78/0x8f
>>  [126348.628517]  [<ffffffff810038a9>] do_notify_resume+0x40/0x4e
>>  [126348.628837]  [<ffffffff818a5a0a>] int_signal+0x12/0x17
> 
> Ok, so there's another possibility for pppoe_release() to be called 
> while
> sk->sk_state is PPPOX_{CONNECTED,BOUND,ZOMBIE} but po->pppoe_dev is 
> NULL.
> 
> I'll check the code to see if I can find any race wrt. po->pppoe_dev
> and sk->sk_state settings.
> 
> In a previous message, you said you'd try reverting 287f3a943fef
> ("pppoe: Use workqueue to die properly when a PADT is received") and
> related patches. I guess "related patches" means 665a6cd809f4 ("pppoe:
> drop pppoe device in pppoe_unbind_sock_work"), right?.
> Did these reverts give any successful result?
> 
> BTW, please don't top-post.
I am doing just "dirty" patch like this, i cannot certainly remember if 
i was doing git reversal, because
it was a while when i spotted this bug. After that pppoe server is not 
rebooting.

diff -Naur linux-4.2.2-vanilla/drivers/net/ppp/pppoe.c 
linux-4.2.2-changed/drivers/net/ppp/pppoe.c
--- linux-4.2.2-vanilla/drivers/net/ppp/pppoe.c 2015-09-29 
20:38:27.000000000 +0300
+++ linux-4.2.2-changed/drivers/net/ppp/pppoe.c 2015-10-04 
19:05:55.697732991 +0300
@@ -519,7 +519,7 @@
                 }

                 bh_unlock_sock(sk);
-               if (!schedule_work(&po->proto.pppoe.padt_work))
+//             if (!schedule_work(&po->proto.pppoe.padt_work))
                         sock_put(sk);
         }

@@ -633,7 +633,7 @@

         lock_sock(sk);

-       INIT_WORK(&po->proto.pppoe.padt_work, pppoe_unbind_sock_work);
+//     INIT_WORK(&po->proto.pppoe.padt_work, pppoe_unbind_sock_work);

         error = -EINVAL;
         if (sp->sa_protocol != PX_PROTO_OE)




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Oct. 5, 2015, 10:05 a.m. UTC | #4
From: Guillaume Nault <g.nault@alphalink.fr>
Date: Wed, 30 Sep 2015 11:45:33 +0200

> Since commit 2b018d57ff18 ("pppoe: drop PPPOX_ZOMBIEs in pppoe_release"),
> pppoe_release() calls dev_put(po->pppoe_dev) if sk is in the
> PPPOX_ZOMBIE state. But pppoe_flush_dev() can set sk->sk_state to
> PPPOX_ZOMBIE _and_ reset po->pppoe_dev to NULL. This leads to the
> following oops:
 ...
> pppoe_flush_dev() has no reason to override sk->sk_state with
> PPPOX_ZOMBIE. pppox_unbind_sock() already sets sk->sk_state to
> PPPOX_DEAD, which is the correct state given that sk is unbound and
> po->pppoe_dev is NULL.
> 
> Fixes: 2b018d57ff18 ("pppoe: drop PPPOX_ZOMBIEs in pppoe_release")
> Tested-by: Oleksii Berezhniak <core@irc.lg.ua>
> Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>

Applied and queued up for -stable, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Guillaume Nault Oct. 5, 2015, 12:08 p.m. UTC | #5
> I am doing just "dirty" patch like this, i cannot certainly remember if i
> was doing git reversal, because
> it was a while when i spotted this bug. After that pppoe server is not
> rebooting.
> 
> diff -Naur linux-4.2.2-vanilla/drivers/net/ppp/pppoe.c
> linux-4.2.2-changed/drivers/net/ppp/pppoe.c
> --- linux-4.2.2-vanilla/drivers/net/ppp/pppoe.c 2015-09-29
> 20:38:27.000000000 +0300
> +++ linux-4.2.2-changed/drivers/net/ppp/pppoe.c 2015-10-04
> 19:05:55.697732991 +0300
> @@ -519,7 +519,7 @@
>                 }
> 
>                 bh_unlock_sock(sk);
> -               if (!schedule_work(&po->proto.pppoe.padt_work))
> +//             if (!schedule_work(&po->proto.pppoe.padt_work))
>                         sock_put(sk);
>         }
> 
> @@ -633,7 +633,7 @@
> 
>         lock_sock(sk);
> 
> -       INIT_WORK(&po->proto.pppoe.padt_work, pppoe_unbind_sock_work);
> +//     INIT_WORK(&po->proto.pppoe.padt_work, pppoe_unbind_sock_work);
> 
>         error = -EINVAL;
>         if (sp->sa_protocol != PX_PROTO_OE)
> 
Ok, so this is clearly related with PADT message handling. Setting
sk->sk_state to PPPOX_ZOMBIE in pppoe_disc_rcv() looks wrong to me.
Furthurmore, at a first glance, it doesn't look necessary. If you're
feeling lucky, you can try the following diff (WARNING: not even
compile-tested!):

 	if (po) {
 		struct sock *sk = sk_pppox(po);
 
-		bh_lock_sock(sk);
-
-		/* If the user has locked the socket, just ignore
-		 * the packet.  With the way two rcv protocols hook into
-		 * one socket family type, we cannot (easily) distinguish
-		 * what kind of SKB it is during backlog rcv.
-		 */
-		if (sock_owned_by_user(sk) == 0) {
-			/* We're no longer connect at the PPPOE layer,
-			 * and must wait for ppp channel to disconnect us.
-			 */
-			sk->sk_state = PPPOX_ZOMBIE;
-		}
-
-		bh_unlock_sock(sk);
 		if (!schedule_work(&po->proto.pppoe.padt_work))
 			sock_put(sk);
 	}

I'll take a closer look and do proper testing during the week.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Guillaume Nault Oct. 5, 2015, 12:24 p.m. UTC | #6
On Mon, Oct 05, 2015 at 04:08:51AM +0000, Matt Bennett wrote:
> Hi, I am seeing this panic occur occasionally however I am unsure how to
> go about reproducing it. Is it enough to simply keep creating and
> tearing down the PPP interface? I can also test and/or investigate this
> issue if a suitable reproduction method is available.
> 
There are at least two issues resulting in similar Oops.

The first one goes with MTU/address/link state updates on the
underlying interface: any such update on an interface used by a
PPPoE connection will generally result in an Oops when releasing the
PPPoE connection. This is fixed by e6740165b8f7 ("ppp: don't override
sk->sk_state in pppoe_flush_dev()").

The second one seems to be trickier. It looks like a race wrt. PADT
message reception. Reproducing the bug will probably require to
generate some PADT flooding to a host that creates and releases PPPoE
connections.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matt Bennett Oct. 6, 2015, 12:26 a.m. UTC | #7
On Mon, 2015-10-05 at 14:24 +0200, Guillaume Nault wrote:
> On Mon, Oct 05, 2015 at 04:08:51AM +0000, Matt Bennett wrote:

> > Hi, I am seeing this panic occur occasionally however I am unsure how to

> > go about reproducing it. Is it enough to simply keep creating and

> > tearing down the PPP interface? I can also test and/or investigate this

> > issue if a suitable reproduction method is available.

> > 

> There are at least two issues resulting in similar Oops.

> 

> The first one goes with MTU/address/link state updates on the

> underlying interface: any such update on an interface used by a

> PPPoE connection will generally result in an Oops when releasing the

> PPPoE connection. This is fixed by e6740165b8f7 ("ppp: don't override

> sk->sk_state in pppoe_flush_dev()").


Without your patch ("ppp: don't override sk->sk_state in
pppoe_flush_dev()") I can see the following function calls being made
when changing the mtu on the underlying ethernet interface for the PPPoE
connection:

1. pppoe_flush_dev() - setting PPPOX_ZOMBIE

2. pppoe_connect - setting PPPOX_NONE (shown below)

/* Delete the old binding */
	if (stage_session(po->pppoe_pa.sid)) {
		pppox_unbind_sock(sk);
		pn = pppoe_pernet(sock_net(sk));
		delete_item(pn, po->pppoe_pa.sid,
			    po->pppoe_pa.remote, po->pppoe_ifindex);
		if (po->pppoe_dev) {
			dev_put(po->pppoe_dev);
			po->pppoe_dev = NULL;
		}

		memset(sk_pppox(po) + 1, 0,
		       sizeof(struct pppox_sock) - sizeof(struct sock));
		sk->sk_state = PPPOX_NONE;
	}

3. pppoe_release - No oops (since sk->sk_state is no longer in
{PPPOX_CONNECTED,PPPOX_BOUND,PPPOX_ZOMBIE})

It doesn't look to me like the above functions can execute
asynchronously but I'd have to look harder. I am using 3.16 by the way.

> 

> The second one seems to be trickier. It looks like a race wrt. PADT

> message reception. Reproducing the bug will probably require to

> generate some PADT flooding to a host that creates and releases PPPoE

> connections.


I will investigate the PADT message reception however since I am on 3.16
I don't have the commits for "pppoe: Use workqueue to die properly when
a PADT is received" and "pppoe: drop pppoe device in
pppoe_unbind_sock_work". 

Matt
> --

> To unsubscribe from this list: send the line "unsubscribe netdev" in

> the body of a message to majordomo@vger.kernel.org

> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matt Bennett Oct. 6, 2015, 4:46 a.m. UTC | #8
> > The second one seems to be trickier. It looks like a race wrt. PADT

> > message reception. Reproducing the bug will probably require to

> > generate some PADT flooding to a host that creates and releases PPPoE

> > connections.


Ok I think I can see the potential race here, specifically the PADT
frame is received while the pppoe interface is being deleted. (I will
have a go inducing this with msleep() in the code tomorrow)

1. pppoe_flush_dev() - sk->sk_state = PPPOX_DEAD, po->pppoe_dev = NULL

2. pppoe_connect() - sk->sk_state = PPPOX_NONE, po->pppoe_dev = NULL

3. pppoe_disc_rcv() - sk->sk_state = PPPOX_ZOMBIE po->pppoe_dev = NULL

4. pppoe_release() - dev_put(po->pppoe_dev) ----> Oops


Either in pppoe_disc_rcv() we add the condition:

@@ -496,7 +499,8 @@ static int pppoe_disc_rcv(struct sk_buff *skb,
struct net_device *dev,
                        /* We're no longer connect at the PPPOE layer,
                         * and must wait for ppp channel to disconnect
us.
                         */
-                       sk->sk_state = PPPOX_ZOMBIE;
+                       if (sk->sk_state & PPPOX_CONNECTED)
+                               sk->sk_state = PPPOX_ZOMBIE;
                }

Or perhaps we remove the assumption that the state PPPOX_ZOMBIE has a
non-null pppoe_dev on it.

I don't know why the code isn't like the following anyway.

-if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND | PPPOX_ZOMBIE)) {
+if (po->pppoe_dev) {
	dev_put(po->pppoe_dev);
	po->pppoe_dev = NULL;
}
Guillaume Nault Oct. 6, 2015, 8:50 a.m. UTC | #9
On Tue, Oct 06, 2015 at 12:26:20AM +0000, Matt Bennett wrote:
> On Mon, 2015-10-05 at 14:24 +0200, Guillaume Nault wrote:
> > On Mon, Oct 05, 2015 at 04:08:51AM +0000, Matt Bennett wrote:
> > > Hi, I am seeing this panic occur occasionally however I am unsure how to
> > > go about reproducing it. Is it enough to simply keep creating and
> > > tearing down the PPP interface? I can also test and/or investigate this
> > > issue if a suitable reproduction method is available.
> > > 
> > There are at least two issues resulting in similar Oops.
> > 
> > The first one goes with MTU/address/link state updates on the
> > underlying interface: any such update on an interface used by a
> > PPPoE connection will generally result in an Oops when releasing the
> > PPPoE connection. This is fixed by e6740165b8f7 ("ppp: don't override
> > sk->sk_state in pppoe_flush_dev()").
> 
> Without your patch ("ppp: don't override sk->sk_state in
> pppoe_flush_dev()") I can see the following function calls being made
> when changing the mtu on the underlying ethernet interface for the PPPoE
> connection:
> 
> 1. pppoe_flush_dev() - setting PPPOX_ZOMBIE
> 
> 2. pppoe_connect - setting PPPOX_NONE (shown below)
> 
> /* Delete the old binding */
> 	if (stage_session(po->pppoe_pa.sid)) {
> 		pppox_unbind_sock(sk);
> 		pn = pppoe_pernet(sock_net(sk));
> 		delete_item(pn, po->pppoe_pa.sid,
> 			    po->pppoe_pa.remote, po->pppoe_ifindex);
> 		if (po->pppoe_dev) {
> 			dev_put(po->pppoe_dev);
> 			po->pppoe_dev = NULL;
> 		}
> 
> 		memset(sk_pppox(po) + 1, 0,
> 		       sizeof(struct pppox_sock) - sizeof(struct sock));
> 		sk->sk_state = PPPOX_NONE;
> 	}
> 
> 3. pppoe_release - No oops (since sk->sk_state is no longer in
> {PPPOX_CONNECTED,PPPOX_BOUND,PPPOX_ZOMBIE})
> 
> It doesn't look to me like the above functions can execute
> asynchronously but I'd have to look harder. I am using 3.16 by the way.
> 
Just drop the pppoe_connect() call. Right after the pppoe_flush_dev()
call, sk_state is PPPOX_ZOMBIE and pppoe_dev is NULL. This is enouhg to
make pppoe_release() crash.

The typical scenario e6740165b8f7 ("ppp: don't override sk->sk_state in
pppoe_flush_dev()") fixes is:

  Userspace process #1:                       Userspace process #2:
  ---------------------                       ---------------------
    fd = socket(AF_PPPOX, PX_PROTO_OE, 0);
    connect(fd, {AF_PPPOX, PX_PROTO_EO,
            $sid, $mac_addr, $ifname},
            sizeof(struct sockaddr_pppox));

    ... process_packets() ...                   # ip link set $ifname mtu $mtu

    close(fd); --> Kernel Oops
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Guillaume Nault Oct. 6, 2015, 9:46 a.m. UTC | #10
On Tue, Oct 06, 2015 at 04:46:04AM +0000, Matt Bennett wrote:
> > > The second one seems to be trickier. It looks like a race wrt. PADT
> > > message reception. Reproducing the bug will probably require to
> > > generate some PADT flooding to a host that creates and releases PPPoE
> > > connections.
> 
> Ok I think I can see the potential race here, specifically the PADT
> frame is received while the pppoe interface is being deleted. (I will
> have a go inducing this with msleep() in the code tomorrow)
> 
> 1. pppoe_flush_dev() - sk->sk_state = PPPOX_DEAD, po->pppoe_dev = NULL
> 
> 2. pppoe_connect() - sk->sk_state = PPPOX_NONE, po->pppoe_dev = NULL
> 
> 3. pppoe_disc_rcv() - sk->sk_state = PPPOX_ZOMBIE po->pppoe_dev = NULL
> 
> 4. pppoe_release() - dev_put(po->pppoe_dev) ----> Oops
> 
Again, I don't know why you introduce pppoe_connect() into the mix.
But anyway, you got the point. Note that pppoe_flush_dev() could be
replaced by other calls since we just need to reset po->pppoe_dev
(another pppoe_unbind_sock_work() call, due to duplicated PADT, would
also trigger the bug). Note also that pppoe_release() needs to be run
before pppoe_unbind_sock_work() gets scheduled (or at least before it
locks the socket).

> Either in pppoe_disc_rcv() we add the condition:
> 
> @@ -496,7 +499,8 @@ static int pppoe_disc_rcv(struct sk_buff *skb,
> struct net_device *dev,
>                         /* We're no longer connect at the PPPOE layer,
>                          * and must wait for ppp channel to disconnect
> us.
>                          */
> -                       sk->sk_state = PPPOX_ZOMBIE;
> +                       if (sk->sk_state & PPPOX_CONNECTED)
> +                               sk->sk_state = PPPOX_ZOMBIE;
>                 }
> 
> Or perhaps we remove the assumption that the state PPPOX_ZOMBIE has a
> non-null pppoe_dev on it.
> 
I don't think adding complexity in the socket state management would be
a good think. Actually I event think about dropping the PPPOX_ZOMBIE
state altogether. But that's probably something for net-next.

> I don't know why the code isn't like the following anyway.
> 
> -if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND | PPPOX_ZOMBIE)) {
> +if (po->pppoe_dev) {
> 	dev_put(po->pppoe_dev);
> 	po->pppoe_dev = NULL;
> }
I was thinking about that same approach. pppoe_release() is the only
function making that assumption. Other parts of the code seem to only
require that PPPOX_CONNECTED => pppoe_dev != NULL.

But I think the original condition was valid. Adding PPPOX_ZOMBIE into
the test and resetting pppoe_dev upon reception of PADT have changed the
relationship between sk_state and pppoe_dev, which is where the problem
stands.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matt Bennett Oct. 6, 2015, 9:12 p.m. UTC | #11
On Tue, 2015-10-06 at 11:46 +0200, Guillaume Nault wrote:
> On Tue, Oct 06, 2015 at 04:46:04AM +0000, Matt Bennett wrote:

> > > > The second one seems to be trickier. It looks like a race wrt. PADT

> > > > message reception. Reproducing the bug will probably require to

> > > > generate some PADT flooding to a host that creates and releases PPPoE

> > > > connections.

> > 

> > Ok I think I can see the potential race here, specifically the PADT

> > frame is received while the pppoe interface is being deleted. (I will

> > have a go inducing this with msleep() in the code tomorrow)

> > 

> > 1. pppoe_flush_dev() - sk->sk_state = PPPOX_DEAD, po->pppoe_dev = NULL

> > 

> > 2. pppoe_connect() - sk->sk_state = PPPOX_NONE, po->pppoe_dev = NULL

> > 

> > 3. pppoe_disc_rcv() - sk->sk_state = PPPOX_ZOMBIE po->pppoe_dev = NULL

> > 

> > 4. pppoe_release() - dev_put(po->pppoe_dev) ----> Oops

> > 

> Again, I don't know why you introduce pppoe_connect() into the mix.

Sorry, I'm just going off the function calls I can see happening with my
kernel (3.16).
> But anyway, you got the point. Note that pppoe_flush_dev() could be

> replaced by other calls since we just need to reset po->pppoe_dev

> (another pppoe_unbind_sock_work() call, due to duplicated PADT, would

> also trigger the bug). Note also that pppoe_release() needs to be run

> before pppoe_unbind_sock_work() gets scheduled (or at least before it

> locks the socket).

> 

> > Either in pppoe_disc_rcv() we add the condition:

> > 

> > @@ -496,7 +499,8 @@ static int pppoe_disc_rcv(struct sk_buff *skb,

> > struct net_device *dev,

> >                         /* We're no longer connect at the PPPOE layer,

> >                          * and must wait for ppp channel to disconnect

> > us.

> >                          */

> > -                       sk->sk_state = PPPOX_ZOMBIE;

> > +                       if (sk->sk_state & PPPOX_CONNECTED)

> > +                               sk->sk_state = PPPOX_ZOMBIE;

> >                 }

> > 

> > Or perhaps we remove the assumption that the state PPPOX_ZOMBIE has a

> > non-null pppoe_dev on it.

> > 

> I don't think adding complexity in the socket state management would be

> a good think. Actually I event think about dropping the PPPOX_ZOMBIE

> state altogether. But that's probably something for net-next.

> 

> > I don't know why the code isn't like the following anyway.

> > 

> > -if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND | PPPOX_ZOMBIE)) {

> > +if (po->pppoe_dev) {

> > 	dev_put(po->pppoe_dev);

> > 	po->pppoe_dev = NULL;

> > }

> I was thinking about that same approach. pppoe_release() is the only

> function making that assumption. Other parts of the code seem to only

> require that PPPOX_CONNECTED => pppoe_dev != NULL.

> 

> But I think the original condition was valid. Adding PPPOX_ZOMBIE into

> the test and resetting pppoe_dev upon reception of PADT have changed the

> relationship between sk_state and pppoe_dev, which is where the problem

> stands.

Yes originally the condition was valid. But I think the issue is plain
to see when you look at the comment beside PPPOX_ZOMBIE declared in the
enum.

 PPPOX_ZOMBIE	= 8,  /* dead, but still bound to ppp device */

We have seen in the situation we have described previously that we can
be in this state without being bound to the ppp device.

In my opinion the entire logic around
pppoe_disc_rcv()/pppoe_unbind_sock_work() looks wrong and I agree we
should do what you suggested a few emails back.

i.e in pppoe_disc_rcv():

if (po) {
		struct sock *sk = sk_pppox(po);

-		bh_lock_sock(sk);
-
-		/* If the user has locked the socket, just ignore
-		 * the packet.  With the way two rcv protocols hook into
-		 * one socket family type, we cannot (easily) distinguish
-		 * what kind of SKB it is during backlog rcv.
-		 */
-		if (sock_owned_by_user(sk) == 0) {
-			/* We're no longer connect at the PPPOE layer,
-			 * and must wait for ppp channel to disconnect us.
-			 */
-			sk->sk_state = PPPOX_ZOMBIE;
-		}
-
-		bh_unlock_sock(sk);
		if (!schedule_work(&po->proto.pppoe.padt_work))
			sock_put(sk);
	}

Subsequently the PPPOX_ZOMBIE state can be completely removed?

Matt
Guillaume Nault Oct. 7, 2015, 10:32 a.m. UTC | #12
On Tue, Oct 06, 2015 at 09:12:18PM +0000, Matt Bennett wrote:
> On Tue, 2015-10-06 at 11:46 +0200, Guillaume Nault wrote:
> > On Tue, Oct 06, 2015 at 04:46:04AM +0000, Matt Bennett wrote:
> > > I don't know why the code isn't like the following anyway.
> > > 
> > > -if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND | PPPOX_ZOMBIE)) {
> > > +if (po->pppoe_dev) {
> > > 	dev_put(po->pppoe_dev);
> > > 	po->pppoe_dev = NULL;
> > > }
> > I was thinking about that same approach. pppoe_release() is the only
> > function making that assumption. Other parts of the code seem to only
> > require that PPPOX_CONNECTED => pppoe_dev != NULL.
> > 
> > But I think the original condition was valid. Adding PPPOX_ZOMBIE into
> > the test and resetting pppoe_dev upon reception of PADT have changed the
> > relationship between sk_state and pppoe_dev, which is where the problem
> > stands.
> Yes originally the condition was valid. But I think the issue is plain
> to see when you look at the comment beside PPPOX_ZOMBIE declared in the
> enum.
> 
>  PPPOX_ZOMBIE	= 8,  /* dead, but still bound to ppp device */
> 
> We have seen in the situation we have described previously that we can
> be in this state without being bound to the ppp device.
> 
> In my opinion the entire logic around
> pppoe_disc_rcv()/pppoe_unbind_sock_work() looks wrong and I agree we
> should do what you suggested a few emails back.
> 
> i.e in pppoe_disc_rcv():
> 
> if (po) {
> 		struct sock *sk = sk_pppox(po);
> 
> -		bh_lock_sock(sk);
> -
> -		/* If the user has locked the socket, just ignore
> -		 * the packet.  With the way two rcv protocols hook into
> -		 * one socket family type, we cannot (easily) distinguish
> -		 * what kind of SKB it is during backlog rcv.
> -		 */
> -		if (sock_owned_by_user(sk) == 0) {
> -			/* We're no longer connect at the PPPOE layer,
> -			 * and must wait for ppp channel to disconnect us.
> -			 */
> -			sk->sk_state = PPPOX_ZOMBIE;
> -		}
> -
> -		bh_unlock_sock(sk);
> 		if (!schedule_work(&po->proto.pppoe.padt_work))
> 			sock_put(sk);
> 	}
> 
Yes, with the introduction of pppoe_unbind_sock_work(), setting
PPPOX_ZOMBIE shouldn't be required anymore.

> Subsequently the PPPOX_ZOMBIE state can be completely removed?
> 
Yes, this is the last place where PPPOX_ZOMBIE can be set.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index 3837ae3..2ed7506 100644
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -313,7 +313,6 @@  static void pppoe_flush_dev(struct net_device *dev)
 			if (po->pppoe_dev == dev &&
 			    sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND | PPPOX_ZOMBIE)) {
 				pppox_unbind_sock(sk);
-				sk->sk_state = PPPOX_ZOMBIE;
 				sk->sk_state_change(sk);
 				po->pppoe_dev = NULL;
 				dev_put(dev);