diff mbox

[net] tg3: Avoid NULL pointer dereference in tg3_io_error_detected()

Message ID 1475006728-15307-1-git-send-email-gpiccoli@linux.vnet.ibm.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Guilherme G. Piccoli Sept. 27, 2016, 8:05 p.m. UTC
From: Milton Miller <miltonm@us.ibm.com>

While the driver is probing the adapter, an error may occur before the
netdev structure is allocated and attached to pci_dev. In this case,
not only netdev isn't available, but the tg3 private structure is also
not available as it is just math from the NULL pointer, so dereferences
must be skipped.

The following trace is seen when the error is triggered:

  [1.402247] Unable to handle kernel paging request for data at address 0x00001a99
  [1.402410] Faulting instruction address: 0xc0000000007e33f8
  [1.402450] Oops: Kernel access of bad area, sig: 11 [#1]
  [1.402481] SMP NR_CPUS=2048 NUMA PowerNV
  [1.402513] Modules linked in:
  [1.402545] CPU: 0 PID: 651 Comm: eehd Not tainted 4.4.0-36-generic #55-Ubuntu
  [1.402591] task: c000001fe4e42a20 ti: c000001fe4e88000 task.ti: c000001fe4e88000
  [1.402742] NIP: c0000000007e33f8 LR: c0000000007e3164 CTR: c000000000595ea0
  [1.402787] REGS: c000001fe4e8b790 TRAP: 0300   Not tainted  (4.4.0-36-generic)
  [1.402832] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 28000422  XER: 20000000
  [1.403058] CFAR: c000000000008468 DAR: 0000000000001a99 DSISR: 42000000 SOFTE: 1
  GPR00: c0000000007e3164 c000001fe4e8ba10 c0000000015c5e00 0000000000000000
  GPR04: 0000000000000001 0000000000000000 0000000000000039 0000000000000299
  GPR08: 0000000000000000 0000000000000001 c000001fe4e88000 0000000000000006
  GPR12: 0000000000000000 c00000000fb40000 c0000000000e6558 c000003ca1bffd00
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000000d52768
  GPR24: c000000000d52740 0000000000000100 c000003ca1b52000 0000000000000002
  GPR28: 0000000000000900 0000000000000000 c00000000152a0c0 c000003ca1b52000
  [1.404226] NIP [c0000000007e33f8] tg3_io_error_detected+0x308/0x340
  [1.404265] LR [c0000000007e3164] tg3_io_error_detected+0x74/0x340

This patch avoids the NULL pointer dereference by moving the access after
the netdev NULL pointer check on tg3_io_error_detected().

Fixes: 0486a063b1ff ("tg3: prevent ifup/ifdown during PCI error recovery")
Fixes: dfc8f370316b ("net/tg3: Release IRQs on permanent error")
Tested-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Signed-off-by: Milton Miller <miltonm@us.ibm.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
---
 drivers/net/ethernet/broadcom/tg3.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Michael Chan Sept. 27, 2016, 8:58 p.m. UTC | #1
On Tue, Sep 27, 2016 at 1:05 PM, Guilherme G. Piccoli
<gpiccoli@linux.vnet.ibm.com> wrote:
> From: Milton Miller <miltonm@us.ibm.com>
>
> While the driver is probing the adapter, an error may occur before the
> netdev structure is allocated and attached to pci_dev. In this case,
> not only netdev isn't available, but the tg3 private structure is also
> not available as it is just math from the NULL pointer, so dereferences
> must be skipped.
>
> The following trace is seen when the error is triggered:
>
>   [1.402247] Unable to handle kernel paging request for data at address 0x00001a99
>   [1.402410] Faulting instruction address: 0xc0000000007e33f8
>   [1.402450] Oops: Kernel access of bad area, sig: 11 [#1]
>   [1.402481] SMP NR_CPUS=2048 NUMA PowerNV
>   [1.402513] Modules linked in:
>   [1.402545] CPU: 0 PID: 651 Comm: eehd Not tainted 4.4.0-36-generic #55-Ubuntu
>   [1.402591] task: c000001fe4e42a20 ti: c000001fe4e88000 task.ti: c000001fe4e88000
>   [1.402742] NIP: c0000000007e33f8 LR: c0000000007e3164 CTR: c000000000595ea0
>   [1.402787] REGS: c000001fe4e8b790 TRAP: 0300   Not tainted  (4.4.0-36-generic)
>   [1.402832] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 28000422  XER: 20000000
>   [1.403058] CFAR: c000000000008468 DAR: 0000000000001a99 DSISR: 42000000 SOFTE: 1
>   GPR00: c0000000007e3164 c000001fe4e8ba10 c0000000015c5e00 0000000000000000
>   GPR04: 0000000000000001 0000000000000000 0000000000000039 0000000000000299
>   GPR08: 0000000000000000 0000000000000001 c000001fe4e88000 0000000000000006
>   GPR12: 0000000000000000 c00000000fb40000 c0000000000e6558 c000003ca1bffd00
>   GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>   GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000000d52768
>   GPR24: c000000000d52740 0000000000000100 c000003ca1b52000 0000000000000002
>   GPR28: 0000000000000900 0000000000000000 c00000000152a0c0 c000003ca1b52000
>   [1.404226] NIP [c0000000007e33f8] tg3_io_error_detected+0x308/0x340
>   [1.404265] LR [c0000000007e3164] tg3_io_error_detected+0x74/0x340
>
> This patch avoids the NULL pointer dereference by moving the access after
> the netdev NULL pointer check on tg3_io_error_detected().
>
> Fixes: 0486a063b1ff ("tg3: prevent ifup/ifdown during PCI error recovery")
> Fixes: dfc8f370316b ("net/tg3: Release IRQs on permanent error")
> Tested-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
> Signed-off-by: Milton Miller <miltonm@us.ibm.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>

Looks good.  Do we need to add !netdev check in tg3_io_resume()?
Guilherme G. Piccoli Sept. 27, 2016, 9:27 p.m. UTC | #2
On 09/27/2016 05:58 PM, Michael Chan wrote:
> On Tue, Sep 27, 2016 at 1:05 PM, Guilherme G. Piccoli
> <gpiccoli@linux.vnet.ibm.com> wrote:
>> From: Milton Miller <miltonm@us.ibm.com>
>>
>> While the driver is probing the adapter, an error may occur before the
>> netdev structure is allocated and attached to pci_dev. In this case,
>> not only netdev isn't available, but the tg3 private structure is also
>> not available as it is just math from the NULL pointer, so dereferences
>> must be skipped.
>>
>> The following trace is seen when the error is triggered:
>>
>>    [1.402247] Unable to handle kernel paging request for data at address 0x00001a99
>>    [1.402410] Faulting instruction address: 0xc0000000007e33f8
>>    [1.402450] Oops: Kernel access of bad area, sig: 11 [#1]
>>    [1.402481] SMP NR_CPUS=2048 NUMA PowerNV
>>    [1.402513] Modules linked in:
>>    [1.402545] CPU: 0 PID: 651 Comm: eehd Not tainted 4.4.0-36-generic #55-Ubuntu
>>    [1.402591] task: c000001fe4e42a20 ti: c000001fe4e88000 task.ti: c000001fe4e88000
>>    [1.402742] NIP: c0000000007e33f8 LR: c0000000007e3164 CTR: c000000000595ea0
>>    [1.402787] REGS: c000001fe4e8b790 TRAP: 0300   Not tainted  (4.4.0-36-generic)
>>    [1.402832] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 28000422  XER: 20000000
>>    [1.403058] CFAR: c000000000008468 DAR: 0000000000001a99 DSISR: 42000000 SOFTE: 1
>>    GPR00: c0000000007e3164 c000001fe4e8ba10 c0000000015c5e00 0000000000000000
>>    GPR04: 0000000000000001 0000000000000000 0000000000000039 0000000000000299
>>    GPR08: 0000000000000000 0000000000000001 c000001fe4e88000 0000000000000006
>>    GPR12: 0000000000000000 c00000000fb40000 c0000000000e6558 c000003ca1bffd00
>>    GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>    GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000000d52768
>>    GPR24: c000000000d52740 0000000000000100 c000003ca1b52000 0000000000000002
>>    GPR28: 0000000000000900 0000000000000000 c00000000152a0c0 c000003ca1b52000
>>    [1.404226] NIP [c0000000007e33f8] tg3_io_error_detected+0x308/0x340
>>    [1.404265] LR [c0000000007e3164] tg3_io_error_detected+0x74/0x340
>>
>> This patch avoids the NULL pointer dereference by moving the access after
>> the netdev NULL pointer check on tg3_io_error_detected().
>>
>> Fixes: 0486a063b1ff ("tg3: prevent ifup/ifdown during PCI error recovery")
>> Fixes: dfc8f370316b ("net/tg3: Release IRQs on permanent error")
>> Tested-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
>> Signed-off-by: Milton Miller <miltonm@us.ibm.com>
>> Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
>
> Looks good.  Do we need to add !netdev check in tg3_io_resume()?

Thanks Michael. It's a good point - I didn't trigger any error without 
the check, but looking at error handlers, every one seems to have this 
check except tg3_io_resume().

Do you want us to send a v2 including this check? Or maybe another patch?

Cheers,



Guilherme
Michael Chan Sept. 27, 2016, 9:42 p.m. UTC | #3
On Tue, Sep 27, 2016 at 2:27 PM, Guilherme G. Piccoli
<gpiccoli@linux.vnet.ibm.com> wrote:
> On 09/27/2016 05:58 PM, Michael Chan wrote:
>>
>> On Tue, Sep 27, 2016 at 1:05 PM, Guilherme G. Piccoli
>> <gpiccoli@linux.vnet.ibm.com> wrote:
>>>
>>> From: Milton Miller <miltonm@us.ibm.com>
>>>
>>> While the driver is probing the adapter, an error may occur before the
>>> netdev structure is allocated and attached to pci_dev. In this case,
>>> not only netdev isn't available, but the tg3 private structure is also
>>> not available as it is just math from the NULL pointer, so dereferences
>>> must be skipped.
>>>
>>> The following trace is seen when the error is triggered:
>>>
>>>    [1.402247] Unable to handle kernel paging request for data at address
>>> 0x00001a99
>>>    [1.402410] Faulting instruction address: 0xc0000000007e33f8
>>>    [1.402450] Oops: Kernel access of bad area, sig: 11 [#1]
>>>    [1.402481] SMP NR_CPUS=2048 NUMA PowerNV
>>>    [1.402513] Modules linked in:
>>>    [1.402545] CPU: 0 PID: 651 Comm: eehd Not tainted 4.4.0-36-generic
>>> #55-Ubuntu
>>>    [1.402591] task: c000001fe4e42a20 ti: c000001fe4e88000 task.ti:
>>> c000001fe4e88000
>>>    [1.402742] NIP: c0000000007e33f8 LR: c0000000007e3164 CTR:
>>> c000000000595ea0
>>>    [1.402787] REGS: c000001fe4e8b790 TRAP: 0300   Not tainted
>>> (4.4.0-36-generic)
>>>    [1.402832] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR:
>>> 28000422  XER: 20000000
>>>    [1.403058] CFAR: c000000000008468 DAR: 0000000000001a99 DSISR:
>>> 42000000 SOFTE: 1
>>>    GPR00: c0000000007e3164 c000001fe4e8ba10 c0000000015c5e00
>>> 0000000000000000
>>>    GPR04: 0000000000000001 0000000000000000 0000000000000039
>>> 0000000000000299
>>>    GPR08: 0000000000000000 0000000000000001 c000001fe4e88000
>>> 0000000000000006
>>>    GPR12: 0000000000000000 c00000000fb40000 c0000000000e6558
>>> c000003ca1bffd00
>>>    GPR16: 0000000000000000 0000000000000000 0000000000000000
>>> 0000000000000000
>>>    GPR20: 0000000000000000 0000000000000000 0000000000000000
>>> c000000000d52768
>>>    GPR24: c000000000d52740 0000000000000100 c000003ca1b52000
>>> 0000000000000002
>>>    GPR28: 0000000000000900 0000000000000000 c00000000152a0c0
>>> c000003ca1b52000
>>>    [1.404226] NIP [c0000000007e33f8] tg3_io_error_detected+0x308/0x340
>>>    [1.404265] LR [c0000000007e3164] tg3_io_error_detected+0x74/0x340
>>>
>>> This patch avoids the NULL pointer dereference by moving the access after
>>> the netdev NULL pointer check on tg3_io_error_detected().
>>>
>>> Fixes: 0486a063b1ff ("tg3: prevent ifup/ifdown during PCI error
>>> recovery")
>>> Fixes: dfc8f370316b ("net/tg3: Release IRQs on permanent error")
>>> Tested-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
>>> Signed-off-by: Milton Miller <miltonm@us.ibm.com>
>>> Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
>>
>>
>> Looks good.  Do we need to add !netdev check in tg3_io_resume()?
>
>
> Thanks Michael. It's a good point - I didn't trigger any error without the
> check, but looking at error handlers, every one seems to have this check
> except tg3_io_resume().
>
> Do you want us to send a v2 including this check? Or maybe another patch?
>

I think v2 should be fine.  The additional check is very much related
to the v1 patch.

I will ACK it once you send v2.  Thanks.
diff mbox

Patch

diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index a2551bc..3a5fce7 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -18122,14 +18122,14 @@  static pci_ers_result_t tg3_io_error_detected(struct pci_dev *pdev,
 
 	rtnl_lock();
 
-	/* We needn't recover from permanent error */
-	if (state == pci_channel_io_frozen)
-		tp->pcierr_recovery = true;
-
 	/* We probably don't have netdev yet */
 	if (!netdev || !netif_running(netdev))
 		goto done;
 
+	/* We needn't recover from permanent error */
+	if (state == pci_channel_io_frozen)
+		tp->pcierr_recovery = true;
+
 	tg3_phy_stop(tp);
 
 	tg3_netif_stop(tp);