diff mbox

tg3: link is permanently down after ifdown and ifup

Message ID 20091210011931.GA30802@xw6200.broadcom.net
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Matt Carlson Dec. 10, 2009, 1:19 a.m. UTC
I understand the problem now.  The problem is that tg3_set_power_state()
puts the phy into a low-power mode when it releases the device.  When
the phy is reacquired, a phy reset is missing to take the device back
out of the low-power mode.

The patch at the bottom of this email is the fix I'm currently testing.
If you wish, you can try it out.

On Wed, Dec 09, 2009 at 12:43:10AM -0800, Felix Radensky wrote:
> Hi, Matt
> 
> Did you have a chance to look into this problem ?
> 
> Thanks a lot.
> 
> Felix.
>  
> Michael Chan wrote:
> > On Thu, 2009-11-19 at 08:08 -0800, Felix Radensky wrote:
> >   
> >> Hi,
> >>
> >> The problem goes away if I remove the call to
> >>
> >> tg3_set_power_state(tp, PCI_D3hot);
> >>
> >> from tg3_close().
> >>     
> >
> > Added Matt to CC.  He is on vacation and may not be able to look into
> > this right away.  Thanks.


[PATCH] tg3: Fix 57780 connectivity problems

When management firmware is not present, and WOL is disabled, the driver
will put the phy into a low-power mode when shutting down.  To get out
of low-power mode, the operation must be reversed or the phy must be
reset.

In the past, the phylib reset the phy from either phy_start() or
phy_start_aneg().  This phy reset has been removed from more recent
kernels.  The tg3 driver already has phy reset code in place which is
called when it assumes control of the device.  It is bypassed if phylib
is enabled though.

This patch removes the phylib exception.
---
 drivers/net/tg3.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

Comments

Felix Radensky Dec. 10, 2009, 1:45 a.m. UTC | #1
Hi, Matt

Matt Carlson wrote:
> I understand the problem now.  The problem is that tg3_set_power_state()
> puts the phy into a low-power mode when it releases the device.  When
> the phy is reacquired, a phy reset is missing to take the device back
> out of the low-power mode.
>
> The patch at the bottom of this email is the fix I'm currently testing.
> If you wish, you can try it out.
>
>   
Thanks a lot for the fix, I'll ask the guys with access to hardware to 
test it.

The problem with lost link happened to me when link was synchronized
at 1000Mbit/sec. At 100Mbit/sec, ifdown would result in the following
soft lockup:

BUG: soft lockup - CPU#0 stuck for 61s! [events/0:5]
Modules linked in: ehci_hcd usbcore
NIP: c0191f0c LR: c018774c CTR: c0191f04
REGS: ef83fe60 TRAP: 0901 Not tainted (2.6.31)
MSR: 00029000 <EE,ME,CE> CR: 24022084 XER: 00000000
TASK = ef8312c0[5] 'events/0' THREAD: ef83e000
GPR00: f1040000 ef83ff10 ef8312c0 ef8292c0 00000400 00c04808 00000000 
00000000
GPR08: 00000000 00000000 00000006 00c04800 44024084
NIP [c0191f0c] tg3_read32+0x8/0x18
LR [c018774c] _tw32_flush+0x94/0xa4
Call Trace:
[ef83ff10] [ef8314b8] 0xef8314b8 (unreliable)
[ef83ff30] [c0192074] tg3_adjust_link+0x98/0x24c
[ef83ff50] [c017eab0] phy_state_machine+0xd4/0x5b8
[ef83ff70] [c0037570] worker_thread+0x124/0x1b8
[ef83ffc0] [c003b5fc] kthread+0x78/0x7c
[ef83fff0] [c000f584] kernel_thread+0x4c/0x68

Can you reproduce this ?

Thanks a lot for your help.

Felix.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matt Carlson Dec. 10, 2009, 2:31 a.m. UTC | #2
On Wed, Dec 09, 2009 at 05:45:36PM -0800, Felix Radensky wrote:
> Hi, Matt
> 
> Matt Carlson wrote:
> > I understand the problem now.  The problem is that tg3_set_power_state()
> > puts the phy into a low-power mode when it releases the device.  When
> > the phy is reacquired, a phy reset is missing to take the device back
> > out of the low-power mode.
> >
> > The patch at the bottom of this email is the fix I'm currently testing.
> > If you wish, you can try it out.
> >
> >   
> Thanks a lot for the fix, I'll ask the guys with access to hardware to 
> test it.
> 
> The problem with lost link happened to me when link was synchronized
> at 1000Mbit/sec. At 100Mbit/sec, ifdown would result in the following
> soft lockup:
> 
> BUG: soft lockup - CPU#0 stuck for 61s! [events/0:5]
> Modules linked in: ehci_hcd usbcore
> NIP: c0191f0c LR: c018774c CTR: c0191f04
> REGS: ef83fe60 TRAP: 0901 Not tainted (2.6.31)
> MSR: 00029000 <EE,ME,CE> CR: 24022084 XER: 00000000
> TASK = ef8312c0[5] 'events/0' THREAD: ef83e000
> GPR00: f1040000 ef83ff10 ef8312c0 ef8292c0 00000400 00c04808 00000000 
> 00000000
> GPR08: 00000000 00000000 00000006 00c04800 44024084
> NIP [c0191f0c] tg3_read32+0x8/0x18
> LR [c018774c] _tw32_flush+0x94/0xa4
> Call Trace:
> [ef83ff10] [ef8314b8] 0xef8314b8 (unreliable)
> [ef83ff30] [c0192074] tg3_adjust_link+0x98/0x24c
> [ef83ff50] [c017eab0] phy_state_machine+0xd4/0x5b8
> [ef83ff70] [c0037570] worker_thread+0x124/0x1b8
> [ef83ffc0] [c003b5fc] kthread+0x78/0x7c
> [ef83fff0] [c000f584] kernel_thread+0x4c/0x68
> 
> Can you reproduce this ?
> 
> Thanks a lot for your help.

Is this with the "Fix phylib locking strategy" patch in place?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Felix Radensky Dec. 10, 2009, 5:09 a.m. UTC | #3
Hi, Matt

Matt Carlson wrote:
> On Wed, Dec 09, 2009 at 05:45:36PM -0800, Felix Radensky wrote:
>   
>> Hi, Matt
>>
>> Matt Carlson wrote:
>>     
>>> I understand the problem now.  The problem is that tg3_set_power_state()
>>> puts the phy into a low-power mode when it releases the device.  When
>>> the phy is reacquired, a phy reset is missing to take the device back
>>> out of the low-power mode.
>>>
>>> The patch at the bottom of this email is the fix I'm currently testing.
>>> If you wish, you can try it out.
>>>
>>>   
>>>       
>> Thanks a lot for the fix, I'll ask the guys with access to hardware to 
>> test it.
>>
>> The problem with lost link happened to me when link was synchronized
>> at 1000Mbit/sec. At 100Mbit/sec, ifdown would result in the following
>> soft lockup:
>>
>> BUG: soft lockup - CPU#0 stuck for 61s! [events/0:5]
>> Modules linked in: ehci_hcd usbcore
>> NIP: c0191f0c LR: c018774c CTR: c0191f04
>> REGS: ef83fe60 TRAP: 0901 Not tainted (2.6.31)
>> MSR: 00029000 <EE,ME,CE> CR: 24022084 XER: 00000000
>> TASK = ef8312c0[5] 'events/0' THREAD: ef83e000
>> GPR00: f1040000 ef83ff10 ef8312c0 ef8292c0 00000400 00c04808 00000000 
>> 00000000
>> GPR08: 00000000 00000000 00000006 00c04800 44024084
>> NIP [c0191f0c] tg3_read32+0x8/0x18
>> LR [c018774c] _tw32_flush+0x94/0xa4
>> Call Trace:
>> [ef83ff10] [ef8314b8] 0xef8314b8 (unreliable)
>> [ef83ff30] [c0192074] tg3_adjust_link+0x98/0x24c
>> [ef83ff50] [c017eab0] phy_state_machine+0xd4/0x5b8
>> [ef83ff70] [c0037570] worker_thread+0x124/0x1b8
>> [ef83ffc0] [c003b5fc] kthread+0x78/0x7c
>> [ef83fff0] [c000f584] kernel_thread+0x4c/0x68
>>
>> Can you reproduce this ?
>>
>> Thanks a lot for your help.
>>     
>
> Is this with the "Fix phylib locking strategy" patch in place?
>
>   
Yes, this is with tg3.c and broadcom.c from 2.6.32. The hang occurs on 
read from EMAC Mode register.

Felix.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Felix Radensky Dec. 20, 2009, 4:44 p.m. UTC | #4
Hi, Matt

Matt Carlson wrote:
> I understand the problem now.  The problem is that tg3_set_power_state()
> puts the phy into a low-power mode when it releases the device.  When
> the phy is reacquired, a phy reset is missing to take the device back
> out of the low-power mode.
>
> The patch at the bottom of this email is the fix I'm currently testing.
> If you wish, you can try it out.
>
>
> [PATCH] tg3: Fix 57780 connectivity problems
>
> When management firmware is not present, and WOL is disabled, the driver
> will put the phy into a low-power mode when shutting down.  To get out
> of low-power mode, the operation must be reversed or the phy must be
> reset.
>
> In the past, the phylib reset the phy from either phy_start() or
> phy_start_aneg().  This phy reset has been removed from more recent
> kernels.  The tg3 driver already has phy reset code in place which is
> called when it assumes control of the device.  It is bypassed if phylib
> is enabled though.
>
> This patch removes the phylib exception.
> ---
>  drivers/net/tg3.c |    3 +--
>  1 files changed, 1 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
> index 3a74d21..c0a3080 100644
> --- a/drivers/net/tg3.c
> +++ b/drivers/net/tg3.c
> @@ -7523,8 +7523,7 @@ static int tg3_reset_hw(struct tg3 *tp, int reset_phy)
>  		tg3_abort_hw(tp, 1);
>  	}
>  
> -	if (reset_phy &&
> -	    !(tp->tg3_flags3 & TG3_FLG3_USE_PHYLIB))
> +	if (reset_phy)
>  		tg3_phy_reset(tp);
>  
>  	err = tg3_chip_reset(tp);
>   
I can confirm that this patch fixes the problem I had. Thanks a lot for 
the fix.

There was another problem in 2.6.32 that I reported, namely kernel soft 
lockup
when link is synchronized at 100Mbit/sec and tg3 with 57780 goes down . Were
you able to reproduce it ?

Thanks a lot for your help.

Felix.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matt Carlson Dec. 21, 2009, 6:19 p.m. UTC | #5
On Sun, Dec 20, 2009 at 08:44:46AM -0800, Felix Radensky wrote:
> Hi, Matt
> 
> Matt Carlson wrote:
> > I understand the problem now.  The problem is that tg3_set_power_state()
> > puts the phy into a low-power mode when it releases the device.  When
> > the phy is reacquired, a phy reset is missing to take the device back
> > out of the low-power mode.
> >
> > The patch at the bottom of this email is the fix I'm currently testing.
> > If you wish, you can try it out.
> >
> >
> > [PATCH] tg3: Fix 57780 connectivity problems
> >
> > When management firmware is not present, and WOL is disabled, the driver
> > will put the phy into a low-power mode when shutting down.  To get out
> > of low-power mode, the operation must be reversed or the phy must be
> > reset.
> >
> > In the past, the phylib reset the phy from either phy_start() or
> > phy_start_aneg().  This phy reset has been removed from more recent
> > kernels.  The tg3 driver already has phy reset code in place which is
> > called when it assumes control of the device.  It is bypassed if phylib
> > is enabled though.
> >
> > This patch removes the phylib exception.
> > ---
> >  drivers/net/tg3.c |    3 +--
> >  1 files changed, 1 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
> > index 3a74d21..c0a3080 100644
> > --- a/drivers/net/tg3.c
> > +++ b/drivers/net/tg3.c
> > @@ -7523,8 +7523,7 @@ static int tg3_reset_hw(struct tg3 *tp, int reset_phy)
> >  		tg3_abort_hw(tp, 1);
> >  	}
> >  
> > -	if (reset_phy &&
> > -	    !(tp->tg3_flags3 & TG3_FLG3_USE_PHYLIB))
> > +	if (reset_phy)
> >  		tg3_phy_reset(tp);
> >  
> >  	err = tg3_chip_reset(tp);
> >   
> I can confirm that this patch fixes the problem I had. Thanks a lot for 
> the fix.
> 
> There was another problem in 2.6.32 that I reported, namely kernel soft 
> lockup
> when link is synchronized at 100Mbit/sec and tg3 with 57780 goes down . Were
> you able to reproduce it ?

No, but perhaps I don't understand the problem well enough.  Are you
saying that you encounter the problem when you run
'ifconfig bcm57780 down', or is it that losing link itself causes the
problem?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Felix Radensky Dec. 21, 2009, 9:52 p.m. UTC | #6
Hi, Matt

Matt Carlson wrote:
>>
>> There was another problem in 2.6.32 that I reported, namely kernel soft 
>> lockup
>> when link is synchronized at 100Mbit/sec and tg3 with 57780 goes down . Were
>> you able to reproduce it ?
>>     
>
> No, but perhaps I don't understand the problem well enough.  Are you
> saying that you encounter the problem when you run
> 'ifconfig bcm57780 down', or is it that losing link itself causes the
> problem?
>
>   
It is enough to connect the NIC to 10/100 switch and run 'ifconfig ethX 
down' to reproduce
the problem.

Thanks.

Felix.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Felix Radensky Jan. 4, 2010, 1:41 p.m. UTC | #7
Hi, Matt

Felix Radensky wrote:
> Hi, Matt
>
> Matt Carlson wrote:
>>>
>>> There was another problem in 2.6.32 that I reported, namely kernel 
>>> soft lockup
>>> when link is synchronized at 100Mbit/sec and tg3 with 57780 goes 
>>> down . Were
>>> you able to reproduce it ?
>>>     
>>
>> No, but perhaps I don't understand the problem well enough.  Are you
>> saying that you encounter the problem when you run
>> 'ifconfig bcm57780 down', or is it that losing link itself causes the
>> problem?
>>
>>   
> It is enough to connect the NIC to 10/100 switch and run 'ifconfig 
> ethX down' to reproduce
> the problem.
>
Did you have a chance to look into this ?

Thanks.

Felix.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index 3a74d21..c0a3080 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -7523,8 +7523,7 @@  static int tg3_reset_hw(struct tg3 *tp, int reset_phy)
 		tg3_abort_hw(tp, 1);
 	}
 
-	if (reset_phy &&
-	    !(tp->tg3_flags3 & TG3_FLG3_USE_PHYLIB))
+	if (reset_phy)
 		tg3_phy_reset(tp);
 
 	err = tg3_chip_reset(tp);