diff mbox

[OOPS] In __netif_receive_skb_core

Message ID 1452457610.16445.13.camel@edumazet-glaptop2.roam.corp.google.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Jan. 10, 2016, 8:26 p.m. UTC
On Sun, 2016-01-10 at 19:48 +0200, Ivaylo Dimitrov wrote:
> Anyone?
> 
> On  7.01.2016 19:54, Ivaylo Dimitrov wrote:
> > Hi,
> >
> > Trying to bring up the modem interface on Nokia n900 device with recent
> > linux, leads to the following kernel oops (read from the mtdoops device):
> >
> > <6>[  144.009765] bq27xxx-battery 2-0055: battery is not calibrated!
> > ignoring capacity values
> > <5>[  145.88964] ssi-protocol ssi-protocol: WAKELINES TEST OK
> > <6>[  145.194793] IPv6: ADDRCONF(NETDEV_CHANGE): phonet0: link becomes
> > ready
> > <1>[  145.358154] Unable to handle kernel NULL pointer dereference at
> > virtual address 0000005c
> > <1>[  145.366821] pgd = ce530000
> > <1>[  145.369689] [0000005c] *pgd=8e44c831, *pte=00000000, *ppte=00000000
> > <0>[  145.376373] Internal error: Oops: 17 [#1] PREEMPT ARM
> > <4>[  145.381683] Modules linked in: cmt_speech nokia_modem ssi_protocol
> > sha256_generic hmac drbg ansi_cprng ctr ccm rfcomm sd_mod scsi_mod
> > bnep"bluetooth omaplfb pvrsrvkm ipv6 bq2415x_charger uinput hsi_char
> > radio_platform_si4713 joydev omap_ssi_port video_bus_switch arc4
> > wl1251_spi gpio_keys isp1704_charger wl1251 mac80211 smc91x mii cfg80211
> > omap_wdt omap_sham crc7 tsc2005 tsc200x_core si4713 bq27xxx_battery
> > leds_lp5523 leds_lp55xx_common adp1653 tsl2563 rtc_twl twl4030_wdt
> > et8ek8 ad5820 v4l2_common smiaregs videodev twl4030_vibra ff_memless
> > media lis3lv02d_i2c lis3lv02d input_polldev omap_ssi hsi ti_soc_thermal
> > thermal_sys hwmon rx51_battery
> > <4>[  145.441802] CPU: 0 PID: 1040 Comm: csd Not tainted 4.4.0-rc7+ #2
> > <4>[  145.448120] Hardware name: Nokia RX-51 board
> > <4>[  145.452636] task: ce517700 ti: cef50000 task.ti: cef50000
> > <4>[  145.458343] PC is at __netif_receive_skb_core+0x7c0/0x92c
> > <4>[  145.464050] LR is at sock_queue_rcv_skb+0x208/0x214
> > <4>[  145.469207] pc : [<c0393ebc>]    lr : [<c03852f4>]    psr: 00 00113
> > <4>[  145.469207] sp : cef51e98  ip : 15800000  fp : c3a4a780
> > <4>[  145.481292] r10: c3a2005c  r9 : 00000000  r8 : c3987834
> > <4>[  145.486816] r7 : 0000f500  r6 : c3a20000  r5 : c3a20048  r4 :
> > c3987780
> > <4>[  145.493682] r3 : 00000000  r2 : 00000002  r1 : 00000000  r0 :
> > 00000000
> > <4>[  145.500579] Flags: nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM
> > Segment none
> > <4>[  145.508087] Control: 10c5387d  Table: 8e530019  DAC: 00000051
> > <0>[  145.514160] Process csd (pid: 1040, stack limit = 0xcef50210)
> > <0>[  145.520202] Stack: (0xcef51e98 to 0xcef52000)
> > <0>[  145.524810] 1e80:       cf334000 cf3ccd90
> > <0>[  145.533447] 1ea0: 00000002 c3987780 c0651400 c3a5507c c3912000
> > c0068534 c3a5507c 40000113
> > <0>[  145.542083] 1ec0: ffffffff bf3b5fac bf3b5d88 ffffffff 00000000
> > c0675f80 c3987780 00000000
> > <0>[  145.550720] 1ee0: c0675f74 c0675f48 cef51f18 00000040 cef51f20
> > c03965f4 c0396580 c0675f80
> > <0>[  145.559356] 1f00: 00000001 0000012c 00000040 c0676217 ffffc399
> > c0396d48 cef51f18 cef51f18
> > <0>[  145.567993] 1f20: cef51f20 cef51f20 cf334000 00000008 c0677240
> > 00000008 c0677200 c067724c
> > <0>[  145.576629] 1f40: 00000100 00400100 c0440be4 c0030dec cf807f00
> > fa2000cc 00000003 00000003
> > <0>[  145.585266] 1f60: ffffc398 00000004 10c53c7d 00000000 00000000
> > cf802000 00000001 10c53c7d
> > <0>[  145.593902] 1f80: b6ea4f44 00001684 bee8e7dc c003119c 00000000
> > c005b44c cef51fb0 b6e1b024
> > <0>[  145.602539] 1fa0: a0000010 ffffffff 10c5387d c043c610 b6ea71bc
> > 00000000 00000000 00002f48
> > <0>[  145.611175] 1fc0: 00000000 b6ea71bc 00002f48 b6ebb684 b6eba000
> > b6ea4f44 00001684 bee8e7dc
> > <0>[  145.619812] 1fe0: 00019817 bee8e788 b6e19d0c b6e1b024 a0000010
> > ffffffff 00000000 00000000
> > <4>[  145.628648] [<c0393ebc>] (__netif_receive_skb_core) from
> > [<c03965f4>] (process_backlog+0x74/0xf4)
> > <4>[  145.637817] [<c03965f4>] (process_backlog) from [<c0396d48>]
> > (net_rx_action+0xd0/0x284)
> > <4>[  145.646301] [<c0396d48>] (net_rx_action) from [<c0030dec>]
> > (__do_softirq+0xb0/0x208)
> > <4>[  145.654479] [<c0030dec>] (__do_softirq) from [<c003119c>]
> > (irq_exit+0x80/0xe4)
> > <4>[  145.662109] [<c003119c>] (irq_exit) from [<c005b44c>]
> > (__handle_domain_irq+0x88/0xa8)
> > <4>[  145.670379] [<c005b44c>] (__handle_domain_irq) from [<c043c610>]
> > (__irq_usr+0x50/0x80)
> > <0>[  145.678741] Code: e59d400c e5943014 e1530006 0a000025 (e593505c)
> > <4>[  145.685241] ---[ end trace 17f822c9893a7c21 ]---
> > <0>[  145.691864] Kernel pan)c - not syncing: Fatal exception in interrupt
> >
> > I tracked the problem down to the commit
> > <7866a621043fbaca3d7389e9b9f69dd1a2e5a855> ("dev: add per net_device
> > packet type chains"). After reverting that commit, the oops no longer
> > appear.
> >
> > Userspace on Nokia N900 talks to the modem via phonet interface.
> >
> > Please advice on how to proceed to fix the problem and if there is
> > anything else I can provide.
> >

I do not see anything wrong with this commit.

It must uncover a prior bug.

Seems to be a phonet bug not reacting to NETDEV_DOWN maybe ?

Comments

Ivaylo Dimitrov Jan. 11, 2016, 9:03 p.m. UTC | #1
On 10.01.2016 22:26, Eric Dumazet wrote:
>
> I do not see anything wrong with this commit.
>
> It must uncover a prior bug.
>

Could be, but it was working fine in 3.19.

> Seems to be a phonet bug not reacting to NETDEV_DOWN maybe ?
>
>
> diff --git a/net/phonet/pn_dev.c b/net/phonet/pn_dev.c
> index a58680016472..fd2f44940bd7 100644
> --- a/net/phonet/pn_dev.c
> +++ b/net/phonet/pn_dev.c

The patch makes difference:

<3>[  118.943542] et8ek8 3-003e: could not get clock
<6>[  119.028045] IPv6: ADDRCONF(NETDEV_UP): phonet0: link is not ready
<4>[  120.945922] sched: RT throttling activated
<5>[  158.556091] ssi-protocol ssi-protocol: WAKELINES TEST OK
<6>[  158.561828] IPv6: ADDRCONF(NETDEV_CHANGE): phonet0: link becomes ready
<1>[  158.708831] Unable to handle kernel NULL pointer dereference at 
virtual address 0000005c
<1>[  158.717498] pgd = ce548000
<1>[  158.720367] [0000005c] *pgd=8e53a831, *pte=00000000, *ppte=00000000
<0>[  158.727020] Internal error: Oops: 17 [#1] PREEMPT ARM
<4>[  158.732330] Modules linked in: cmt_speech nokia_modem ssi_protocol 
vfat fat rfcomm sd_mod scsi_mod bnep bluetooth omaplfb pvrsrvkm ipv6 
bq2415x_charger uinput hsi_char radio_platform_si4713 joydev 
omap_ssi_port video_bus_switch arc4 wl1251_spi wl1251 isp1704_charger 
gpio_keys mac80211 smc91x mii cfg80211 omap_wdt omap_sham crc7 tsc2005 
tsc200x_core si4713 leds_lp5523 leds_lp55xx_common adp1653 
bq27xxx_battery tsl2563 twl4030_wdt rtc_twl et8ek8 smiaregs ad5820 
v4l2_common twl4030_vibra ff_memless videodev lis3lv02d_i2c media 
lis3lv02d input_polldev omap_ssi hsi ti_soc_thermal thermal_sys hwmon 
rx51_battery
<4>[  158.789154] CPU: 0 PID: 1038 Comm: csd Not tainted 4.4.0-rc7+ #14
<4>[  158.795562] Hardware name: Nokia RX-51 board
<4>[  158.800048] task: ce4d5900 ti: ce4e4000 task.ti: ce4e4000
<4>[  158.805725] PC is at __netif_receive_skb_core+0x7c0/0x92c
<4>[  158.811401] LR is at sock_queue_rcv_skb+0x208/0x214
<4>[  158.816528] pc : [<c0394218>]    lr : [<c0385648>]    psr: 00000113
<4>[  158.816528] sp : ce4e5e98  ip : 14400000  fp : cd4ab380
<4>[  158.828582] r10: c249405c  r9 : 00000000  r8 : c24b2d74
<4>[  158.834045] r7 : 0000f500  r6 : c2494000  r5 : c2494048  r4 : c24b2cc0
<4>[  158.840911] r3 : 00000000  r2 : 00000002  r1 : 00000000  r0 : 00000000
<4>[  158.847778] Flags: nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM 
Segment none
<4>[  158.855285] Control: 10c5387d  Table: 8e548019  DAC: 00000051
<0>[  158.861297] Process csd (pid: 1038, stack limit = 0xce4e4210)
<0>[  158.867340] Stack: (0xce4e5e98 to 0xce4e6000)
<0>[  158.871917] 5e80: 
       cf3ab000 cf2e4d90
<0>[  158.880554] 5ea0: 00000002 c24b2cc0 c06534dc c243b67c c5dcae00 
c00685ac c243b67c 40000113
<0>[  158.889129] 5ec0: ffffffff bf3b0fac bf3b0d88 ffffffff 00000000 
c06780c0 c24b2cc0 00000000
<0>[  158.897735] 5ee0: c06780b4 c0678088 ce4e5f18 00000040 ce4e5f20 
c0396950 c03968dc c06780c0
<0>[  158.906341] 5f00: 00000001 0000012c 00000040 c0678357 ffffc8d0 
c03970a4 ce4e5f18 ce4e5f18
<0>[  158.914947] 5f20: ce4e5f20 ce4e5f20 cf3ab000 00000008 c0679380 
00000008 c0679340 c067938c
<0>[  158.923522] 5f40: 00000100 00400100 c0441be8 c0030e64 cf807f00 
fa2000cc 00000003 00000003
<0>[  158.932128] 5f60: ffffc8cf 00000004 10c53c7d 00000000 00000000 
cf802000 00000001 10c53c7d
<0>[  158.940734] 5f80: 00001a30 0000026c beba476c c0031214 00000000 
c005b4c4 ce4e5fb0 b6e3f82c
<0>[  158.949310] 5fa0: 40000010 ffffffff 10c5387d c043c990 00000014 
00000000 0001ef78 00000014
<0>[  158.957946] 5fc0: 0001edf8 00000001 00000000 00000000 00000000 
00001a30 0000026c beba476c
<0>[  158.966522] 5fe0: b6e4d82c beba4748 b6e3f82c b6e3f82c 40000010 
ffffffff ddee3fd6 e7f5d5dd
<4>[  158.975158] [<c0394218>] (__netif_receive_skb_core) from 
[<c0396950>] (process_backlog+0x74/0xf4)
<4>[  158.984497] [<c0396950>] (process_backlog) from [<c03970a4>] 
(net_rx_action+0xd0/0x284)
<4>[  158.992919] [<c03970a4>] (net_rx_action) from [<c0030e64>] 
(__do_softirq+0xb0/0x208)
<4>[  159.001037] [<c0030e64>] (__do_softirq) from [<c0031214>] 
(irq_exit+0x80/0xe4)
<4>[  159.008636] [<c0031214>] (irq_exit) from [<c005b4c4>] 
(__handle_domain_irq+0x88/0xa8)
<4>[  159.016906] [<c005b4c4>] (__handle_domain_irq) from [<c043c990>] 
(__irq_usr+0x50/0x80)
<0>[  159.025207] Code: e59d400c e5943014 e1530006 0a000025 (e593505c)
<4>[  159.031677] ---[ end trace c52d1e36f2d07d67 ]---
<0>[  159.038146] Kernel panic - not syncing: Fatal exception in interrupt
--[ end trace c52d1e36f2d07d67 ]---

Could you provide some hints on where to put some printks or dump some 
data to get a clearer picture on what happens?

Thanks,
Ivo
Salam Noureddine Jan. 11, 2016, 10:11 p.m. UTC | #2
Would you be able to disassemble your kernel so we could tell where
the null pointer dereference happens?

Thanks,

Salam

On Mon, Jan 11, 2016 at 1:03 PM, Ivaylo Dimitrov
<ivo.g.dimitrov.75@gmail.com> wrote:
>
>
> On 10.01.2016 22:26, Eric Dumazet wrote:
>>
>>
>> I do not see anything wrong with this commit.
>>
>> It must uncover a prior bug.
>>
>
> Could be, but it was working fine in 3.19.
>
>> Seems to be a phonet bug not reacting to NETDEV_DOWN maybe ?
>>
>>
>> diff --git a/net/phonet/pn_dev.c b/net/phonet/pn_dev.c
>> index a58680016472..fd2f44940bd7 100644
>> --- a/net/phonet/pn_dev.c
>> +++ b/net/phonet/pn_dev.c
>
>
> The patch makes difference:
>
> <3>[  118.943542] et8ek8 3-003e: could not get clock
> <6>[  119.028045] IPv6: ADDRCONF(NETDEV_UP): phonet0: link is not ready
> <4>[  120.945922] sched: RT throttling activated
> <5>[  158.556091] ssi-protocol ssi-protocol: WAKELINES TEST OK
> <6>[  158.561828] IPv6: ADDRCONF(NETDEV_CHANGE): phonet0: link becomes ready
> <1>[  158.708831] Unable to handle kernel NULL pointer dereference at
> virtual address 0000005c
> <1>[  158.717498] pgd = ce548000
> <1>[  158.720367] [0000005c] *pgd=8e53a831, *pte=00000000, *ppte=00000000
> <0>[  158.727020] Internal error: Oops: 17 [#1] PREEMPT ARM
> <4>[  158.732330] Modules linked in: cmt_speech nokia_modem ssi_protocol
> vfat fat rfcomm sd_mod scsi_mod bnep bluetooth omaplfb pvrsrvkm ipv6
> bq2415x_charger uinput hsi_char radio_platform_si4713 joydev omap_ssi_port
> video_bus_switch arc4 wl1251_spi wl1251 isp1704_charger gpio_keys mac80211
> smc91x mii cfg80211 omap_wdt omap_sham crc7 tsc2005 tsc200x_core si4713
> leds_lp5523 leds_lp55xx_common adp1653 bq27xxx_battery tsl2563 twl4030_wdt
> rtc_twl et8ek8 smiaregs ad5820 v4l2_common twl4030_vibra ff_memless videodev
> lis3lv02d_i2c media lis3lv02d input_polldev omap_ssi hsi ti_soc_thermal
> thermal_sys hwmon rx51_battery
> <4>[  158.789154] CPU: 0 PID: 1038 Comm: csd Not tainted 4.4.0-rc7+ #14
> <4>[  158.795562] Hardware name: Nokia RX-51 board
> <4>[  158.800048] task: ce4d5900 ti: ce4e4000 task.ti: ce4e4000
> <4>[  158.805725] PC is at __netif_receive_skb_core+0x7c0/0x92c
> <4>[  158.811401] LR is at sock_queue_rcv_skb+0x208/0x214
> <4>[  158.816528] pc : [<c0394218>]    lr : [<c0385648>]    psr: 00000113
> <4>[  158.816528] sp : ce4e5e98  ip : 14400000  fp : cd4ab380
> <4>[  158.828582] r10: c249405c  r9 : 00000000  r8 : c24b2d74
> <4>[  158.834045] r7 : 0000f500  r6 : c2494000  r5 : c2494048  r4 : c24b2cc0
> <4>[  158.840911] r3 : 00000000  r2 : 00000002  r1 : 00000000  r0 : 00000000
> <4>[  158.847778] Flags: nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM
> Segment none
> <4>[  158.855285] Control: 10c5387d  Table: 8e548019  DAC: 00000051
> <0>[  158.861297] Process csd (pid: 1038, stack limit = 0xce4e4210)
> <0>[  158.867340] Stack: (0xce4e5e98 to 0xce4e6000)
> <0>[  158.871917] 5e80:       cf3ab000 cf2e4d90
> <0>[  158.880554] 5ea0: 00000002 c24b2cc0 c06534dc c243b67c c5dcae00
> c00685ac c243b67c 40000113
> <0>[  158.889129] 5ec0: ffffffff bf3b0fac bf3b0d88 ffffffff 00000000
> c06780c0 c24b2cc0 00000000
> <0>[  158.897735] 5ee0: c06780b4 c0678088 ce4e5f18 00000040 ce4e5f20
> c0396950 c03968dc c06780c0
> <0>[  158.906341] 5f00: 00000001 0000012c 00000040 c0678357 ffffc8d0
> c03970a4 ce4e5f18 ce4e5f18
> <0>[  158.914947] 5f20: ce4e5f20 ce4e5f20 cf3ab000 00000008 c0679380
> 00000008 c0679340 c067938c
> <0>[  158.923522] 5f40: 00000100 00400100 c0441be8 c0030e64 cf807f00
> fa2000cc 00000003 00000003
> <0>[  158.932128] 5f60: ffffc8cf 00000004 10c53c7d 00000000 00000000
> cf802000 00000001 10c53c7d
> <0>[  158.940734] 5f80: 00001a30 0000026c beba476c c0031214 00000000
> c005b4c4 ce4e5fb0 b6e3f82c
> <0>[  158.949310] 5fa0: 40000010 ffffffff 10c5387d c043c990 00000014
> 00000000 0001ef78 00000014
> <0>[  158.957946] 5fc0: 0001edf8 00000001 00000000 00000000 00000000
> 00001a30 0000026c beba476c
> <0>[  158.966522] 5fe0: b6e4d82c beba4748 b6e3f82c b6e3f82c 40000010
> ffffffff ddee3fd6 e7f5d5dd
> <4>[  158.975158] [<c0394218>] (__netif_receive_skb_core) from [<c0396950>]
> (process_backlog+0x74/0xf4)
> <4>[  158.984497] [<c0396950>] (process_backlog) from [<c03970a4>]
> (net_rx_action+0xd0/0x284)
> <4>[  158.992919] [<c03970a4>] (net_rx_action) from [<c0030e64>]
> (__do_softirq+0xb0/0x208)
> <4>[  159.001037] [<c0030e64>] (__do_softirq) from [<c0031214>]
> (irq_exit+0x80/0xe4)
> <4>[  159.008636] [<c0031214>] (irq_exit) from [<c005b4c4>]
> (__handle_domain_irq+0x88/0xa8)
> <4>[  159.016906] [<c005b4c4>] (__handle_domain_irq) from [<c043c990>]
> (__irq_usr+0x50/0x80)
> <0>[  159.025207] Code: e59d400c e5943014 e1530006 0a000025 (e593505c)
> <4>[  159.031677] ---[ end trace c52d1e36f2d07d67 ]---
> <0>[  159.038146] Kernel panic - not syncing: Fatal exception in interrupt
> --[ end trace c52d1e36f2d07d67 ]---
>
> Could you provide some hints on where to put some printks or dump some data
> to get a clearer picture on what happens?
>
> Thanks,
> Ivo
Ivaylo Dimitrov Jan. 12, 2016, 12:51 a.m. UTC | #3
On 12.01.2016 00:11, Salam Noureddine wrote:
> Would you be able to disassemble your kernel so we could tell where
> the null pointer dereference happens?
>

Sure, but wouldn't it be better to provide the object file containing 
the debug symbols as well?

Otherwise, the null pointer dereference happens somewhere in:

(gdb) l *__netif_receive_skb_core+0x7c0
0x1318 is in __netif_receive_skb_core (include/linux/compiler.h:218).
213	})
214	
215	static __always_inline
216	void __read_once_size(const volatile void *p, void *res, int size)
217	{
218		__READ_ONCE_SIZE;
219	}
220	
221	#ifdef CONFIG_KASAN
222	/*

(gdb) l *__netif_receive_skb_core+0x7bc
0x1314 is in __netif_receive_skb_core (net/core/dev.c:3934).
3929		}
3930	
3931		deliver_ptype_list_skb(skb, &pt_prev, orig_dev, type,
3932				       &orig_dev->ptype_specific);
3933	
3934		if (unlikely(skb->dev != orig_dev)) {
3935			deliver_ptype_list_skb(skb, &pt_prev, orig_dev, type,
3936					       &skb->dev->ptype_specific);
3937		}
3938	

(gdb) l *__netif_receive_skb_core+0x7c4
0x131c is in __netif_receive_skb_core (net/core/dev.c:3935).
3930	
3931		deliver_ptype_list_skb(skb, &pt_prev, orig_dev, type,
3932				       &orig_dev->ptype_specific);
3933	
3934		if (unlikely(skb->dev != orig_dev)) {
3935			deliver_ptype_list_skb(skb, &pt_prev, orig_dev, type,
3936					       &skb->dev->ptype_specific);
3937		}
3938	
3939		if (pt_prev) {


0x00001300 <+1960>:	cmp	r10, r3
0x00001304 <+1964>:	bne	0x1284 <__netif_receive_skb_core+1836>
0x00001308 <+1968>:	ldr	r4, [sp, #12]
0x0000130c <+1972>:	ldr	r3, [r4, #20]
0x00001310 <+1976>:	cmp	r3, r6
0x00001314 <+1980>:	beq	0x13b0 <__netif_receive_skb_core+2136>

0x00001318 <+1984>:	ldr	r5, [r3, #92]	; 0x5c   <-FAULT

r3 seems to be skb->dev

0x0000131c <+1988>:	add	r10, r3, #92	; 0x5c
0x00001320 <+1992>:	add	r8, r4, #180	; 0xb4
0x00001324 <+1996>:	sub	r5, r5, #20
0x00001328 <+2000>:	b	0x13a4 <__netif_receive_skb_core+2124>
0x0000132c <+2004>:	ldrh	r3, [r5]
0x00001330 <+2008>:	cmp	r3, r7


I put some additional printks around that code, and it turned out that 
skb->dev is null, so "if (unlikely(skb->dev != orig_dev))" succeeds, but 
"&skb->dev->ptype_specific" oopses.

Thanks,
Ivo
Eric Dumazet Jan. 12, 2016, 1:06 a.m. UTC | #4
On Tue, 2016-01-12 at 02:51 +0200, Ivaylo Dimitrov wrote:

> 
> I put some additional printks around that code, and it turned out that 
> skb->dev is null, so "if (unlikely(skb->dev != orig_dev))" succeeds, but 
> "&skb->dev->ptype_specific" oopses.

Nice find !

Now lets find what possibly called netif_rx() with skb->dev == NULL 

(This is illegal)
Salam Noureddine Jan. 12, 2016, 1:19 a.m. UTC | #5
It must be that skb->dev was changed to NULL inside of
__netif_receive_skb_core, otherwise we would have crashed much
earlier. Also, orig_dev is saved at the beginning. Possibly a device
is layered on top of the original device.

On Mon, Jan 11, 2016 at 5:06 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2016-01-12 at 02:51 +0200, Ivaylo Dimitrov wrote:
>
>>
>> I put some additional printks around that code, and it turned out that
>> skb->dev is null, so "if (unlikely(skb->dev != orig_dev))" succeeds, but
>> "&skb->dev->ptype_specific" oopses.
>
> Nice find !
>
> Now lets find what possibly called netif_rx() with skb->dev == NULL
>
> (This is illegal)
>
>
>
Eric Dumazet Jan. 12, 2016, 2:21 a.m. UTC | #6
On Mon, 2016-01-11 at 17:19 -0800, Salam Noureddine wrote:
> It must be that skb->dev was changed to NULL inside of
> __netif_receive_skb_core, otherwise we would have crashed much
> earlier. Also, orig_dev is saved at the beginning. Possibly a device
> is layered on top of the original device.

Please do not top post on netdev / lkml mailing lists.

My guess is a protocol handler queued the skb without calling
skb_share_check()
diff mbox

Patch

diff --git a/net/phonet/pn_dev.c b/net/phonet/pn_dev.c
index a58680016472..fd2f44940bd7 100644
--- a/net/phonet/pn_dev.c
+++ b/net/phonet/pn_dev.c
@@ -63,11 +63,11 @@  struct phonet_device_list *phonet_device_list(struct net *net)
 static struct phonet_device *__phonet_device_alloc(struct net_device *dev)
 {
 	struct phonet_device_list *pndevs = phonet_device_list(dev_net(dev));
-	struct phonet_device *pnd = kmalloc(sizeof(*pnd), GFP_ATOMIC);
-	if (pnd == NULL)
+	struct phonet_device *pnd = kzalloc(sizeof(*pnd), GFP_KERNEL);
+
+	if (!pnd)
 		return NULL;
 	pnd->netdev = dev;
-	bitmap_zero(pnd->addrs, 64);
 
 	BUG_ON(!mutex_is_locked(&pndevs->lock));
 	list_add_rcu(&pnd->list, &pndevs->list);
@@ -117,7 +117,7 @@  static void phonet_device_destroy(struct net_device *dev)
 
 		for_each_set_bit(addr, pnd->addrs, 64)
 			phonet_address_notify(RTM_DELADDR, dev, addr);
-		kfree(pnd);
+		kfree_rcu(pnd, rcu);
 	}
 }
 
@@ -301,6 +301,7 @@  static int phonet_device_notify(struct notifier_block *me, unsigned long what,
 		if (dev->type == ARPHRD_PHONET)
 			phonet_device_autoconf(dev);
 		break;
+	case NETDEV_DOWN:
 	case NETDEV_UNREGISTER:
 		phonet_device_destroy(dev);
 		phonet_route_autodel(dev);