Message ID | 20091210011931.GA30802@xw6200.broadcom.net |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
Hi, Matt Matt Carlson wrote: > I understand the problem now. The problem is that tg3_set_power_state() > puts the phy into a low-power mode when it releases the device. When > the phy is reacquired, a phy reset is missing to take the device back > out of the low-power mode. > > The patch at the bottom of this email is the fix I'm currently testing. > If you wish, you can try it out. > > Thanks a lot for the fix, I'll ask the guys with access to hardware to test it. The problem with lost link happened to me when link was synchronized at 1000Mbit/sec. At 100Mbit/sec, ifdown would result in the following soft lockup: BUG: soft lockup - CPU#0 stuck for 61s! [events/0:5] Modules linked in: ehci_hcd usbcore NIP: c0191f0c LR: c018774c CTR: c0191f04 REGS: ef83fe60 TRAP: 0901 Not tainted (2.6.31) MSR: 00029000 <EE,ME,CE> CR: 24022084 XER: 00000000 TASK = ef8312c0[5] 'events/0' THREAD: ef83e000 GPR00: f1040000 ef83ff10 ef8312c0 ef8292c0 00000400 00c04808 00000000 00000000 GPR08: 00000000 00000000 00000006 00c04800 44024084 NIP [c0191f0c] tg3_read32+0x8/0x18 LR [c018774c] _tw32_flush+0x94/0xa4 Call Trace: [ef83ff10] [ef8314b8] 0xef8314b8 (unreliable) [ef83ff30] [c0192074] tg3_adjust_link+0x98/0x24c [ef83ff50] [c017eab0] phy_state_machine+0xd4/0x5b8 [ef83ff70] [c0037570] worker_thread+0x124/0x1b8 [ef83ffc0] [c003b5fc] kthread+0x78/0x7c [ef83fff0] [c000f584] kernel_thread+0x4c/0x68 Can you reproduce this ? Thanks a lot for your help. Felix. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Dec 09, 2009 at 05:45:36PM -0800, Felix Radensky wrote: > Hi, Matt > > Matt Carlson wrote: > > I understand the problem now. The problem is that tg3_set_power_state() > > puts the phy into a low-power mode when it releases the device. When > > the phy is reacquired, a phy reset is missing to take the device back > > out of the low-power mode. > > > > The patch at the bottom of this email is the fix I'm currently testing. > > If you wish, you can try it out. > > > > > Thanks a lot for the fix, I'll ask the guys with access to hardware to > test it. > > The problem with lost link happened to me when link was synchronized > at 1000Mbit/sec. At 100Mbit/sec, ifdown would result in the following > soft lockup: > > BUG: soft lockup - CPU#0 stuck for 61s! [events/0:5] > Modules linked in: ehci_hcd usbcore > NIP: c0191f0c LR: c018774c CTR: c0191f04 > REGS: ef83fe60 TRAP: 0901 Not tainted (2.6.31) > MSR: 00029000 <EE,ME,CE> CR: 24022084 XER: 00000000 > TASK = ef8312c0[5] 'events/0' THREAD: ef83e000 > GPR00: f1040000 ef83ff10 ef8312c0 ef8292c0 00000400 00c04808 00000000 > 00000000 > GPR08: 00000000 00000000 00000006 00c04800 44024084 > NIP [c0191f0c] tg3_read32+0x8/0x18 > LR [c018774c] _tw32_flush+0x94/0xa4 > Call Trace: > [ef83ff10] [ef8314b8] 0xef8314b8 (unreliable) > [ef83ff30] [c0192074] tg3_adjust_link+0x98/0x24c > [ef83ff50] [c017eab0] phy_state_machine+0xd4/0x5b8 > [ef83ff70] [c0037570] worker_thread+0x124/0x1b8 > [ef83ffc0] [c003b5fc] kthread+0x78/0x7c > [ef83fff0] [c000f584] kernel_thread+0x4c/0x68 > > Can you reproduce this ? > > Thanks a lot for your help. Is this with the "Fix phylib locking strategy" patch in place? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, Matt Matt Carlson wrote: > On Wed, Dec 09, 2009 at 05:45:36PM -0800, Felix Radensky wrote: > >> Hi, Matt >> >> Matt Carlson wrote: >> >>> I understand the problem now. The problem is that tg3_set_power_state() >>> puts the phy into a low-power mode when it releases the device. When >>> the phy is reacquired, a phy reset is missing to take the device back >>> out of the low-power mode. >>> >>> The patch at the bottom of this email is the fix I'm currently testing. >>> If you wish, you can try it out. >>> >>> >>> >> Thanks a lot for the fix, I'll ask the guys with access to hardware to >> test it. >> >> The problem with lost link happened to me when link was synchronized >> at 1000Mbit/sec. At 100Mbit/sec, ifdown would result in the following >> soft lockup: >> >> BUG: soft lockup - CPU#0 stuck for 61s! [events/0:5] >> Modules linked in: ehci_hcd usbcore >> NIP: c0191f0c LR: c018774c CTR: c0191f04 >> REGS: ef83fe60 TRAP: 0901 Not tainted (2.6.31) >> MSR: 00029000 <EE,ME,CE> CR: 24022084 XER: 00000000 >> TASK = ef8312c0[5] 'events/0' THREAD: ef83e000 >> GPR00: f1040000 ef83ff10 ef8312c0 ef8292c0 00000400 00c04808 00000000 >> 00000000 >> GPR08: 00000000 00000000 00000006 00c04800 44024084 >> NIP [c0191f0c] tg3_read32+0x8/0x18 >> LR [c018774c] _tw32_flush+0x94/0xa4 >> Call Trace: >> [ef83ff10] [ef8314b8] 0xef8314b8 (unreliable) >> [ef83ff30] [c0192074] tg3_adjust_link+0x98/0x24c >> [ef83ff50] [c017eab0] phy_state_machine+0xd4/0x5b8 >> [ef83ff70] [c0037570] worker_thread+0x124/0x1b8 >> [ef83ffc0] [c003b5fc] kthread+0x78/0x7c >> [ef83fff0] [c000f584] kernel_thread+0x4c/0x68 >> >> Can you reproduce this ? >> >> Thanks a lot for your help. >> > > Is this with the "Fix phylib locking strategy" patch in place? > > Yes, this is with tg3.c and broadcom.c from 2.6.32. The hang occurs on read from EMAC Mode register. Felix. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, Matt Matt Carlson wrote: > I understand the problem now. The problem is that tg3_set_power_state() > puts the phy into a low-power mode when it releases the device. When > the phy is reacquired, a phy reset is missing to take the device back > out of the low-power mode. > > The patch at the bottom of this email is the fix I'm currently testing. > If you wish, you can try it out. > > > [PATCH] tg3: Fix 57780 connectivity problems > > When management firmware is not present, and WOL is disabled, the driver > will put the phy into a low-power mode when shutting down. To get out > of low-power mode, the operation must be reversed or the phy must be > reset. > > In the past, the phylib reset the phy from either phy_start() or > phy_start_aneg(). This phy reset has been removed from more recent > kernels. The tg3 driver already has phy reset code in place which is > called when it assumes control of the device. It is bypassed if phylib > is enabled though. > > This patch removes the phylib exception. > --- > drivers/net/tg3.c | 3 +-- > 1 files changed, 1 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c > index 3a74d21..c0a3080 100644 > --- a/drivers/net/tg3.c > +++ b/drivers/net/tg3.c > @@ -7523,8 +7523,7 @@ static int tg3_reset_hw(struct tg3 *tp, int reset_phy) > tg3_abort_hw(tp, 1); > } > > - if (reset_phy && > - !(tp->tg3_flags3 & TG3_FLG3_USE_PHYLIB)) > + if (reset_phy) > tg3_phy_reset(tp); > > err = tg3_chip_reset(tp); > I can confirm that this patch fixes the problem I had. Thanks a lot for the fix. There was another problem in 2.6.32 that I reported, namely kernel soft lockup when link is synchronized at 100Mbit/sec and tg3 with 57780 goes down . Were you able to reproduce it ? Thanks a lot for your help. Felix. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Dec 20, 2009 at 08:44:46AM -0800, Felix Radensky wrote: > Hi, Matt > > Matt Carlson wrote: > > I understand the problem now. The problem is that tg3_set_power_state() > > puts the phy into a low-power mode when it releases the device. When > > the phy is reacquired, a phy reset is missing to take the device back > > out of the low-power mode. > > > > The patch at the bottom of this email is the fix I'm currently testing. > > If you wish, you can try it out. > > > > > > [PATCH] tg3: Fix 57780 connectivity problems > > > > When management firmware is not present, and WOL is disabled, the driver > > will put the phy into a low-power mode when shutting down. To get out > > of low-power mode, the operation must be reversed or the phy must be > > reset. > > > > In the past, the phylib reset the phy from either phy_start() or > > phy_start_aneg(). This phy reset has been removed from more recent > > kernels. The tg3 driver already has phy reset code in place which is > > called when it assumes control of the device. It is bypassed if phylib > > is enabled though. > > > > This patch removes the phylib exception. > > --- > > drivers/net/tg3.c | 3 +-- > > 1 files changed, 1 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c > > index 3a74d21..c0a3080 100644 > > --- a/drivers/net/tg3.c > > +++ b/drivers/net/tg3.c > > @@ -7523,8 +7523,7 @@ static int tg3_reset_hw(struct tg3 *tp, int reset_phy) > > tg3_abort_hw(tp, 1); > > } > > > > - if (reset_phy && > > - !(tp->tg3_flags3 & TG3_FLG3_USE_PHYLIB)) > > + if (reset_phy) > > tg3_phy_reset(tp); > > > > err = tg3_chip_reset(tp); > > > I can confirm that this patch fixes the problem I had. Thanks a lot for > the fix. > > There was another problem in 2.6.32 that I reported, namely kernel soft > lockup > when link is synchronized at 100Mbit/sec and tg3 with 57780 goes down . Were > you able to reproduce it ? No, but perhaps I don't understand the problem well enough. Are you saying that you encounter the problem when you run 'ifconfig bcm57780 down', or is it that losing link itself causes the problem? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, Matt Matt Carlson wrote: >> >> There was another problem in 2.6.32 that I reported, namely kernel soft >> lockup >> when link is synchronized at 100Mbit/sec and tg3 with 57780 goes down . Were >> you able to reproduce it ? >> > > No, but perhaps I don't understand the problem well enough. Are you > saying that you encounter the problem when you run > 'ifconfig bcm57780 down', or is it that losing link itself causes the > problem? > > It is enough to connect the NIC to 10/100 switch and run 'ifconfig ethX down' to reproduce the problem. Thanks. Felix. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, Matt Felix Radensky wrote: > Hi, Matt > > Matt Carlson wrote: >>> >>> There was another problem in 2.6.32 that I reported, namely kernel >>> soft lockup >>> when link is synchronized at 100Mbit/sec and tg3 with 57780 goes >>> down . Were >>> you able to reproduce it ? >>> >> >> No, but perhaps I don't understand the problem well enough. Are you >> saying that you encounter the problem when you run >> 'ifconfig bcm57780 down', or is it that losing link itself causes the >> problem? >> >> > It is enough to connect the NIC to 10/100 switch and run 'ifconfig > ethX down' to reproduce > the problem. > Did you have a chance to look into this ? Thanks. Felix. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c index 3a74d21..c0a3080 100644 --- a/drivers/net/tg3.c +++ b/drivers/net/tg3.c @@ -7523,8 +7523,7 @@ static int tg3_reset_hw(struct tg3 *tp, int reset_phy) tg3_abort_hw(tp, 1); } - if (reset_phy && - !(tp->tg3_flags3 & TG3_FLG3_USE_PHYLIB)) + if (reset_phy) tg3_phy_reset(tp); err = tg3_chip_reset(tp);