diff mbox

net: phy: fix auto-negotiation stall due to unavailable interrupt

Message ID 1492609604-16359-1-git-send-email-al.kochet@gmail.com
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Alexander Kochetkov April 19, 2017, 1:46 p.m. UTC
The problem I fix related to SMSC LAN8710/LAN8720 PHY handled using
interrupts. During power-up cycle the PHY do auto-negotiation, generate
interrupt and set BMSR_ANEGCOMPLETE flag. Interrupt is handled by PHY
state machine but doesn't update link because PHY is in PHY_READY state.
After some time MAC bring up and connect with PHY. It start PHY using
phy_start(). During startup PHY change state to PHY_AN but doesn't
set BMCR_ANRESTART flag due to genphy_config_aneg() doesn't update MII_BMCR
because there no new to advertising. As a result, state machine wait for
interrupt from PHY and nether get "link is up". Because BMSR_ANEGCOMPLETE
already set the patch schedule check link without waiting interrupt.
In case genphy_config_aneg() update MII_BMCR and set BMCR_ANRESTART
flag, BMSR_ANEGCOMPLETE will be cleared and state machine will continue
on auto-negotiation interrupt.

Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
---
 drivers/net/phy/phy.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Alexander Kochetkov April 19, 2017, 2:05 p.m. UTC | #1
Just found similar problem fixed in another PHY. See commit 99f81afc139c
("phy: micrel: Disable auto negotiation on startup»)

> 19 апр. 2017 г., в 16:46, Alexander Kochetkov <al.kochet@gmail.com> написал(а):
> 
> The problem I fix related to SMSC LAN8710/LAN8720 PHY handled using
> interrupts. During power-up cycle the PHY do auto-negotiation, generate
> interrupt and set BMSR_ANEGCOMPLETE flag. Interrupt is handled by PHY
> state machine but doesn't update link because PHY is in PHY_READY state.
> After some time MAC bring up and connect with PHY. It start PHY using
> phy_start(). During startup PHY change state to PHY_AN but doesn't
> set BMCR_ANRESTART flag due to genphy_config_aneg() doesn't update MII_BMCR
> because there no new to advertising. As a result, state machine wait for
> interrupt from PHY and nether get "link is up". Because BMSR_ANEGCOMPLETE
> already set the patch schedule check link without waiting interrupt.
> In case genphy_config_aneg() update MII_BMCR and set BMCR_ANRESTART
> flag, BMSR_ANEGCOMPLETE will be cleared and state machine will continue
> on auto-negotiation interrupt.
> 
> Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
> ---
> drivers/net/phy/phy.c |   12 ++++++++++++
> 1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
> index 7cc1b7d..da8f03d 100644
> --- a/drivers/net/phy/phy.c
> +++ b/drivers/net/phy/phy.c
> @@ -1169,6 +1169,18 @@ void phy_state_machine(struct work_struct *work)
> 	if (phydev->irq == PHY_POLL)
> 		queue_delayed_work(system_power_efficient_wq, &phydev->state_queue,
> 				   PHY_STATE_TIME * HZ);
> +
> +	/* Re-schedule a PHY state machine to check PHY status because
> +	 * negotiation already done and aneg interrupt may not be generated.
> +	 */
> +	if (needs_aneg && (phydev->irq > 0) && (phydev->state == PHY_AN)) {
> +		err = phy_aneg_done(phydev);
> +		if (err > 0)
> +			queue_delayed_work(system_power_efficient_wq,
> +					   &phydev->state_queue, 0);
> +		if (err < 0)
> +			phy_error(phydev);
> +	}
> }
> 
> /**
> -- 
> 1.7.9.5
>
Florian Fainelli April 19, 2017, 4:32 p.m. UTC | #2
On 04/19/2017 07:05 AM, Alexander Kochetkov wrote:
> Just found similar problem fixed in another PHY. See commit 99f81afc139c
> ("phy: micrel: Disable auto negotiation on startup»)

This specific commit was really a "this works for me, but I have not
investigated what needs fixing in PHYLIB".

Roger has been submitting a couple of patches to fix the same issue and
yours seems to be better in that it also addresses his concerns that
after his fix there was still a 1s delay (the HZ parameter to the
queue_delayed_work call) being seen:

http://patchwork.ozlabs.org/patch/743773/

Roger can you also test Alexander's patch?

> 
>> 19 апр. 2017 г., в 16:46, Alexander Kochetkov <al.kochet@gmail.com> написал(а):
>>
>> The problem I fix related to SMSC LAN8710/LAN8720 PHY handled using
>> interrupts. During power-up cycle the PHY do auto-negotiation, generate
>> interrupt and set BMSR_ANEGCOMPLETE flag. Interrupt is handled by PHY
>> state machine but doesn't update link because PHY is in PHY_READY state.
>> After some time MAC bring up and connect with PHY. It start PHY using
>> phy_start(). During startup PHY change state to PHY_AN but doesn't
>> set BMCR_ANRESTART flag due to genphy_config_aneg() doesn't update MII_BMCR
>> because there no new to advertising. As a result, state machine wait for
>> interrupt from PHY and nether get "link is up". Because BMSR_ANEGCOMPLETE
>> already set the patch schedule check link without waiting interrupt.
>> In case genphy_config_aneg() update MII_BMCR and set BMCR_ANRESTART
>> flag, BMSR_ANEGCOMPLETE will be cleared and state machine will continue
>> on auto-negotiation interrupt.
>>
>> Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
>> ---
>> drivers/net/phy/phy.c |   12 ++++++++++++
>> 1 file changed, 12 insertions(+)
>>
>> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
>> index 7cc1b7d..da8f03d 100644
>> --- a/drivers/net/phy/phy.c
>> +++ b/drivers/net/phy/phy.c
>> @@ -1169,6 +1169,18 @@ void phy_state_machine(struct work_struct *work)
>> 	if (phydev->irq == PHY_POLL)
>> 		queue_delayed_work(system_power_efficient_wq, &phydev->state_queue,
>> 				   PHY_STATE_TIME * HZ);
>> +
>> +	/* Re-schedule a PHY state machine to check PHY status because
>> +	 * negotiation already done and aneg interrupt may not be generated.
>> +	 */
>> +	if (needs_aneg && (phydev->irq > 0) && (phydev->state == PHY_AN)) {
>> +		err = phy_aneg_done(phydev);
>> +		if (err > 0)
>> +			queue_delayed_work(system_power_efficient_wq,
>> +					   &phydev->state_queue, 0);
>> +		if (err < 0)
>> +			phy_error(phydev);
>> +	}
>> }
>>
>> /**
>> -- 
>> 1.7.9.5
>>
>
Alexander Kochetkov April 19, 2017, 4:53 p.m. UTC | #3
> 19 апр. 2017 г., в 19:32, Florian Fainelli <f.fainelli@gmail.com> написал(а):
> 
> http://patchwork.ozlabs.org/patch/743773/
> 
> Roger can you also test Alexander's patch?

If MAC use phy_start_aneg() instead of phy_start() my patch will not work as
expected. Roger, if patch don’t work for you please check what MAC bring up PHY using
phy_start():

http://patchwork.ozlabs.org/patch/752308/

Is it correct to start PHY inside MAC probe using phy_start_aneg()? Or phy_start() must be used?

And probably this tags should be added for my patch:
Fixes: 3c293f4e08b5 ("net: phy: Trigger state machine on state change and not polling.")
Cc: stable <stable@vger.kernel.org> # v4.9+

Because I bisected to commit 529ed1275263 ("net: phy: phy drivers
should not set SUPPORTED_[Asym_]Pause») that looks pretty good.

Also, there is another issue I found. link_timeout doesn’t work for interrupt driven PHY.
It is possible to implement timer to handle this case.
Florian, what do you think? Should this be fixed?

Alexander.
Sergei Shtylyov April 19, 2017, 6:11 p.m. UTC | #4
Hello!

On 04/19/2017 04:46 PM, Alexander Kochetkov wrote:

> The problem I fix related to SMSC LAN8710/LAN8720 PHY handled using
> interrupts. During power-up cycle the PHY do auto-negotiation, generate
> interrupt and set BMSR_ANEGCOMPLETE flag. Interrupt is handled by PHY
> state machine but doesn't update link because PHY is in PHY_READY state.
> After some time MAC bring up and connect with PHY. It start PHY using
> phy_start(). During startup PHY change state to PHY_AN but doesn't
> set BMCR_ANRESTART flag due to genphy_config_aneg() doesn't update MII_BMCR
> because there no new to advertising. As a result, state machine wait for
           ^^^^^^^^^^^^    ^^^^^^^^^^^
    There is no new what? Advertize, maybe?

> interrupt from PHY and nether get "link is up". Because BMSR_ANEGCOMPLETE
                          ^^^^^^
    Never, neither? :-)

> already set the patch schedule check link without waiting interrupt.
> In case genphy_config_aneg() update MII_BMCR and set BMCR_ANRESTART
> flag, BMSR_ANEGCOMPLETE will be cleared and state machine will continue
> on auto-negotiation interrupt.
>
> Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
[...]

MBR, Sergei
Roger Quadros April 20, 2017, 9:21 a.m. UTC | #5
Hi,

On 19/04/17 19:53, Alexander Kochetkov wrote:
> 
>> 19 апр. 2017 г., в 19:32, Florian Fainelli <f.fainelli@gmail.com> написал(а):
>>
>> http://patchwork.ozlabs.org/patch/743773/
>>
>> Roger can you also test Alexander's patch?

This patch fixes my problem and doesn't have the 1 second delay that my patch had.

So,

Tested-by: Roger Quadros <rogerq@ti.com>

> 
> If MAC use phy_start_aneg() instead of phy_start() my patch will not work as
> expected. Roger, if patch don’t work for you please check what MAC bring up PHY using
> phy_start():

Our MAC driver is using phy_start() and I didn't face any issues with your patch.

> 
> http://patchwork.ozlabs.org/patch/752308/
> 
> Is it correct to start PHY inside MAC probe using phy_start_aneg()? Or phy_start() must be used?
> 
> And probably this tags should be added for my patch:
> Fixes: 3c293f4e08b5 ("net: phy: Trigger state machine on state change and not polling.")
> Cc: stable <stable@vger.kernel.org> # v4.9+
> 
> Because I bisected to commit 529ed1275263 ("net: phy: phy drivers
> should not set SUPPORTED_[Asym_]Pause») that looks pretty good.
> 
> Also, there is another issue I found. link_timeout doesn’t work for interrupt driven PHY.
> It is possible to implement timer to handle this case.
> Florian, what do you think? Should this be fixed?
> 
> Alexander.
> 

cheers,
-roger
diff mbox

Patch

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 7cc1b7d..da8f03d 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -1169,6 +1169,18 @@  void phy_state_machine(struct work_struct *work)
 	if (phydev->irq == PHY_POLL)
 		queue_delayed_work(system_power_efficient_wq, &phydev->state_queue,
 				   PHY_STATE_TIME * HZ);
+
+	/* Re-schedule a PHY state machine to check PHY status because
+	 * negotiation already done and aneg interrupt may not be generated.
+	 */
+	if (needs_aneg && (phydev->irq > 0) && (phydev->state == PHY_AN)) {
+		err = phy_aneg_done(phydev);
+		if (err > 0)
+			queue_delayed_work(system_power_efficient_wq,
+					   &phydev->state_queue, 0);
+		if (err < 0)
+			phy_error(phydev);
+	}
 }
 
 /**