Message ID | 1475006728-15307-1-git-send-email-gpiccoli@linux.vnet.ibm.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
On Tue, Sep 27, 2016 at 1:05 PM, Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> wrote: > From: Milton Miller <miltonm@us.ibm.com> > > While the driver is probing the adapter, an error may occur before the > netdev structure is allocated and attached to pci_dev. In this case, > not only netdev isn't available, but the tg3 private structure is also > not available as it is just math from the NULL pointer, so dereferences > must be skipped. > > The following trace is seen when the error is triggered: > > [1.402247] Unable to handle kernel paging request for data at address 0x00001a99 > [1.402410] Faulting instruction address: 0xc0000000007e33f8 > [1.402450] Oops: Kernel access of bad area, sig: 11 [#1] > [1.402481] SMP NR_CPUS=2048 NUMA PowerNV > [1.402513] Modules linked in: > [1.402545] CPU: 0 PID: 651 Comm: eehd Not tainted 4.4.0-36-generic #55-Ubuntu > [1.402591] task: c000001fe4e42a20 ti: c000001fe4e88000 task.ti: c000001fe4e88000 > [1.402742] NIP: c0000000007e33f8 LR: c0000000007e3164 CTR: c000000000595ea0 > [1.402787] REGS: c000001fe4e8b790 TRAP: 0300 Not tainted (4.4.0-36-generic) > [1.402832] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28000422 XER: 20000000 > [1.403058] CFAR: c000000000008468 DAR: 0000000000001a99 DSISR: 42000000 SOFTE: 1 > GPR00: c0000000007e3164 c000001fe4e8ba10 c0000000015c5e00 0000000000000000 > GPR04: 0000000000000001 0000000000000000 0000000000000039 0000000000000299 > GPR08: 0000000000000000 0000000000000001 c000001fe4e88000 0000000000000006 > GPR12: 0000000000000000 c00000000fb40000 c0000000000e6558 c000003ca1bffd00 > GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000000d52768 > GPR24: c000000000d52740 0000000000000100 c000003ca1b52000 0000000000000002 > GPR28: 0000000000000900 0000000000000000 c00000000152a0c0 c000003ca1b52000 > [1.404226] NIP [c0000000007e33f8] tg3_io_error_detected+0x308/0x340 > [1.404265] LR [c0000000007e3164] tg3_io_error_detected+0x74/0x340 > > This patch avoids the NULL pointer dereference by moving the access after > the netdev NULL pointer check on tg3_io_error_detected(). > > Fixes: 0486a063b1ff ("tg3: prevent ifup/ifdown during PCI error recovery") > Fixes: dfc8f370316b ("net/tg3: Release IRQs on permanent error") > Tested-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> > Signed-off-by: Milton Miller <miltonm@us.ibm.com> > Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Looks good. Do we need to add !netdev check in tg3_io_resume()?
On 09/27/2016 05:58 PM, Michael Chan wrote: > On Tue, Sep 27, 2016 at 1:05 PM, Guilherme G. Piccoli > <gpiccoli@linux.vnet.ibm.com> wrote: >> From: Milton Miller <miltonm@us.ibm.com> >> >> While the driver is probing the adapter, an error may occur before the >> netdev structure is allocated and attached to pci_dev. In this case, >> not only netdev isn't available, but the tg3 private structure is also >> not available as it is just math from the NULL pointer, so dereferences >> must be skipped. >> >> The following trace is seen when the error is triggered: >> >> [1.402247] Unable to handle kernel paging request for data at address 0x00001a99 >> [1.402410] Faulting instruction address: 0xc0000000007e33f8 >> [1.402450] Oops: Kernel access of bad area, sig: 11 [#1] >> [1.402481] SMP NR_CPUS=2048 NUMA PowerNV >> [1.402513] Modules linked in: >> [1.402545] CPU: 0 PID: 651 Comm: eehd Not tainted 4.4.0-36-generic #55-Ubuntu >> [1.402591] task: c000001fe4e42a20 ti: c000001fe4e88000 task.ti: c000001fe4e88000 >> [1.402742] NIP: c0000000007e33f8 LR: c0000000007e3164 CTR: c000000000595ea0 >> [1.402787] REGS: c000001fe4e8b790 TRAP: 0300 Not tainted (4.4.0-36-generic) >> [1.402832] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28000422 XER: 20000000 >> [1.403058] CFAR: c000000000008468 DAR: 0000000000001a99 DSISR: 42000000 SOFTE: 1 >> GPR00: c0000000007e3164 c000001fe4e8ba10 c0000000015c5e00 0000000000000000 >> GPR04: 0000000000000001 0000000000000000 0000000000000039 0000000000000299 >> GPR08: 0000000000000000 0000000000000001 c000001fe4e88000 0000000000000006 >> GPR12: 0000000000000000 c00000000fb40000 c0000000000e6558 c000003ca1bffd00 >> GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000000d52768 >> GPR24: c000000000d52740 0000000000000100 c000003ca1b52000 0000000000000002 >> GPR28: 0000000000000900 0000000000000000 c00000000152a0c0 c000003ca1b52000 >> [1.404226] NIP [c0000000007e33f8] tg3_io_error_detected+0x308/0x340 >> [1.404265] LR [c0000000007e3164] tg3_io_error_detected+0x74/0x340 >> >> This patch avoids the NULL pointer dereference by moving the access after >> the netdev NULL pointer check on tg3_io_error_detected(). >> >> Fixes: 0486a063b1ff ("tg3: prevent ifup/ifdown during PCI error recovery") >> Fixes: dfc8f370316b ("net/tg3: Release IRQs on permanent error") >> Tested-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> >> Signed-off-by: Milton Miller <miltonm@us.ibm.com> >> Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> > > Looks good. Do we need to add !netdev check in tg3_io_resume()? Thanks Michael. It's a good point - I didn't trigger any error without the check, but looking at error handlers, every one seems to have this check except tg3_io_resume(). Do you want us to send a v2 including this check? Or maybe another patch? Cheers, Guilherme
On Tue, Sep 27, 2016 at 2:27 PM, Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> wrote: > On 09/27/2016 05:58 PM, Michael Chan wrote: >> >> On Tue, Sep 27, 2016 at 1:05 PM, Guilherme G. Piccoli >> <gpiccoli@linux.vnet.ibm.com> wrote: >>> >>> From: Milton Miller <miltonm@us.ibm.com> >>> >>> While the driver is probing the adapter, an error may occur before the >>> netdev structure is allocated and attached to pci_dev. In this case, >>> not only netdev isn't available, but the tg3 private structure is also >>> not available as it is just math from the NULL pointer, so dereferences >>> must be skipped. >>> >>> The following trace is seen when the error is triggered: >>> >>> [1.402247] Unable to handle kernel paging request for data at address >>> 0x00001a99 >>> [1.402410] Faulting instruction address: 0xc0000000007e33f8 >>> [1.402450] Oops: Kernel access of bad area, sig: 11 [#1] >>> [1.402481] SMP NR_CPUS=2048 NUMA PowerNV >>> [1.402513] Modules linked in: >>> [1.402545] CPU: 0 PID: 651 Comm: eehd Not tainted 4.4.0-36-generic >>> #55-Ubuntu >>> [1.402591] task: c000001fe4e42a20 ti: c000001fe4e88000 task.ti: >>> c000001fe4e88000 >>> [1.402742] NIP: c0000000007e33f8 LR: c0000000007e3164 CTR: >>> c000000000595ea0 >>> [1.402787] REGS: c000001fe4e8b790 TRAP: 0300 Not tainted >>> (4.4.0-36-generic) >>> [1.402832] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: >>> 28000422 XER: 20000000 >>> [1.403058] CFAR: c000000000008468 DAR: 0000000000001a99 DSISR: >>> 42000000 SOFTE: 1 >>> GPR00: c0000000007e3164 c000001fe4e8ba10 c0000000015c5e00 >>> 0000000000000000 >>> GPR04: 0000000000000001 0000000000000000 0000000000000039 >>> 0000000000000299 >>> GPR08: 0000000000000000 0000000000000001 c000001fe4e88000 >>> 0000000000000006 >>> GPR12: 0000000000000000 c00000000fb40000 c0000000000e6558 >>> c000003ca1bffd00 >>> GPR16: 0000000000000000 0000000000000000 0000000000000000 >>> 0000000000000000 >>> GPR20: 0000000000000000 0000000000000000 0000000000000000 >>> c000000000d52768 >>> GPR24: c000000000d52740 0000000000000100 c000003ca1b52000 >>> 0000000000000002 >>> GPR28: 0000000000000900 0000000000000000 c00000000152a0c0 >>> c000003ca1b52000 >>> [1.404226] NIP [c0000000007e33f8] tg3_io_error_detected+0x308/0x340 >>> [1.404265] LR [c0000000007e3164] tg3_io_error_detected+0x74/0x340 >>> >>> This patch avoids the NULL pointer dereference by moving the access after >>> the netdev NULL pointer check on tg3_io_error_detected(). >>> >>> Fixes: 0486a063b1ff ("tg3: prevent ifup/ifdown during PCI error >>> recovery") >>> Fixes: dfc8f370316b ("net/tg3: Release IRQs on permanent error") >>> Tested-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> >>> Signed-off-by: Milton Miller <miltonm@us.ibm.com> >>> Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> >> >> >> Looks good. Do we need to add !netdev check in tg3_io_resume()? > > > Thanks Michael. It's a good point - I didn't trigger any error without the > check, but looking at error handlers, every one seems to have this check > except tg3_io_resume(). > > Do you want us to send a v2 including this check? Or maybe another patch? > I think v2 should be fine. The additional check is very much related to the v1 patch. I will ACK it once you send v2. Thanks.
diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c index a2551bc..3a5fce7 100644 --- a/drivers/net/ethernet/broadcom/tg3.c +++ b/drivers/net/ethernet/broadcom/tg3.c @@ -18122,14 +18122,14 @@ static pci_ers_result_t tg3_io_error_detected(struct pci_dev *pdev, rtnl_lock(); - /* We needn't recover from permanent error */ - if (state == pci_channel_io_frozen) - tp->pcierr_recovery = true; - /* We probably don't have netdev yet */ if (!netdev || !netif_running(netdev)) goto done; + /* We needn't recover from permanent error */ + if (state == pci_channel_io_frozen) + tp->pcierr_recovery = true; + tg3_phy_stop(tp); tg3_netif_stop(tp);