diff mbox

Revert "net: phy: turn carrier off on phy attach"

Message ID 1455300113-2335-1-git-send-email-clemens.gruber@pqgruber.com
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Clemens Gruber Feb. 12, 2016, 6:01 p.m. UTC
Commit 113c74d83eef ("net: phy: turn carrier off on phy attach") breaks
the eth0 link coming up on all my i.MX6Q boards with a Marvell 88E1510.
If I then do a ifconfig eth0 down/up cycle I first get a MDIO read
timeout but then the link becomes ready and everything is back to
normal.
Without this step however, the link stays down forever, an unusually
high amount of phy interrupts occur (about 10000/second) and kworker/0:2
is constantly using over 60% of the CPU.

Reverting it fixes the problems with the link not coming up at boot as
well as the high amount of phy interrupts and kworker load in that
state.

This reverts commit 113c74d83eef870e43a0d9279044e9d5435f0d07.

Signed-off-by: Clemens Gruber <clemens.gruber@pqgruber.com>
---
 drivers/net/phy/phy_device.c | 5 -----
 1 file changed, 5 deletions(-)

Comments

Florian Fainelli Feb. 12, 2016, 6:56 p.m. UTC | #1
On 12/02/16 10:01, Clemens Gruber wrote:
> Commit 113c74d83eef ("net: phy: turn carrier off on phy attach") breaks
> the eth0 link coming up on all my i.MX6Q boards with a Marvell 88E1510.
> If I then do a ifconfig eth0 down/up cycle I first get a MDIO read
> timeout but then the link becomes ready and everything is back to
> normal.
> Without this step however, the link stays down forever, an unusually
> high amount of phy interrupts occur (about 10000/second) and kworker/0:2
> is constantly using over 60% of the CPU.
> 
> Reverting it fixes the problems with the link not coming up at boot as
> well as the high amount of phy interrupts and kworker load in that
> state.

You are seeing this with the FEC driver right? We probably want to
carefully audit the driver and understand what could be going wrong, the
initial change is correct, so there must be something else going on here.

> 
> This reverts commit 113c74d83eef870e43a0d9279044e9d5435f0d07.
> 
> Signed-off-by: Clemens Gruber <clemens.gruber@pqgruber.com>
> ---
>  drivers/net/phy/phy_device.c | 5 -----
>  1 file changed, 5 deletions(-)
> 
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index bad3f00..903737a 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -901,11 +901,6 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
>  
>  	phydev->state = PHY_READY;
>  
> -	/* Initial carrier state is off as the phy is about to be
> -	 * (re)initialized.
> -	 */
> -	netif_carrier_off(phydev->attached_dev);
> -
>  	/* Do initial configuration here, now that
>  	 * we have certain key parameters
>  	 * (dev_flags and interface)
>
Clemens Gruber Feb. 12, 2016, 9:07 p.m. UTC | #2
Hi Florian,

On Fri, Feb 12, 2016 at 10:56:04AM -0800, Florian Fainelli wrote:
> On 12/02/16 10:01, Clemens Gruber wrote:
> > Commit 113c74d83eef ("net: phy: turn carrier off on phy attach") breaks
> > the eth0 link coming up on all my i.MX6Q boards with a Marvell 88E1510.
> > If I then do a ifconfig eth0 down/up cycle I first get a MDIO read
> > timeout but then the link becomes ready and everything is back to
> > normal.
> > Without this step however, the link stays down forever, an unusually
> > high amount of phy interrupts occur (about 10000/second) and kworker/0:2
> > is constantly using over 60% of the CPU.
> > 
> > Reverting it fixes the problems with the link not coming up at boot as
> > well as the high amount of phy interrupts and kworker load in that
> > state.
> 
> You are seeing this with the FEC driver right? We probably want to
> carefully audit the driver and understand what could be going wrong, the
> initial change is correct, so there must be something else going on here.
> 

Yes, this occurs with the fec driver.

In the fec_probe function at line 3471 of
net/ethernet/freescale/fec_main.c, netif_carrier_off is called and a
comment states "Carrier starts down, phylib will bring it up".
Could this be the source of the problem, both fec and phy expecting the
other one to turn on the carrier?

If you have an idea about what could be going wrong in the fec driver,
please let me know.

Thanks,
Clemens
Clemens Gruber Feb. 14, 2016, 6:25 p.m. UTC | #3
On Fri, Feb 12, 2016 at 10:56:04AM -0800, Florian Fainelli wrote:
> On 12/02/16 10:01, Clemens Gruber wrote:
> > Commit 113c74d83eef ("net: phy: turn carrier off on phy attach") breaks
> > the eth0 link coming up on all my i.MX6Q boards with a Marvell 88E1510.
> > If I then do a ifconfig eth0 down/up cycle I first get a MDIO read
> > timeout but then the link becomes ready and everything is back to
> > normal.
> > Without this step however, the link stays down forever, an unusually
> > high amount of phy interrupts occur (about 10000/second) and kworker/0:2
> > is constantly using over 60% of the CPU.
> > 
> > Reverting it fixes the problems with the link not coming up at boot as
> > well as the high amount of phy interrupts and kworker load in that
> > state.
> 
> You are seeing this with the FEC driver right? We probably want to
> carefully audit the driver and understand what could be going wrong, the
> initial change is correct, so there must be something else going on here.

I think I found the underlying problem!
It was not the fec driver but the marvell phy driver, more specifically
the marvell_of_reg_init call being made too late, which lead to the
observed problem at half the boot ups where the link never came up.

With the marvell,reg-init device tree parameter, a flag needs to be set
to tell the Marvell 88E1510 that it should enable the interrupt output.
(At a specific pin, in my case LED[2])
If this is not set (or set too late), the phydev->state is set to UP in
phy_start (called from fec_enet_open) but then, the auto-negotiation
never starts.
In comparison, now, after I called marvell_of_reg_init not in
m88e1510_config_aneg but in marvell_probe, everything works again :)
About a second after the fec_enet_open/phy_start calls, the
auto-negotiation starts (m88e1510_config_aneg) and the phy state changes
from UP to AN, to CHANGELINK and finally to RUNNING.

I will send a patch shortly, calling marvell_of_reg_init from a new
m88e1510_probe function instead of the m88e1510_config_aneg function.

Thanks.
Clemens
Florian Fainelli Feb. 14, 2016, 6:56 p.m. UTC | #4
On February 14, 2016 10:25:30 AM PST, Clemens Gruber <clemens.gruber@pqgruber.com> wrote:
>On Fri, Feb 12, 2016 at 10:56:04AM -0800, Florian Fainelli wrote:
>> On 12/02/16 10:01, Clemens Gruber wrote:
>> > Commit 113c74d83eef ("net: phy: turn carrier off on phy attach")
>breaks
>> > the eth0 link coming up on all my i.MX6Q boards with a Marvell
>88E1510.
>> > If I then do a ifconfig eth0 down/up cycle I first get a MDIO read
>> > timeout but then the link becomes ready and everything is back to
>> > normal.
>> > Without this step however, the link stays down forever, an
>unusually
>> > high amount of phy interrupts occur (about 10000/second) and
>kworker/0:2
>> > is constantly using over 60% of the CPU.
>> > 
>> > Reverting it fixes the problems with the link not coming up at boot
>as
>> > well as the high amount of phy interrupts and kworker load in that
>> > state.
>> 
>> You are seeing this with the FEC driver right? We probably want to
>> carefully audit the driver and understand what could be going wrong,
>the
>> initial change is correct, so there must be something else going on
>here.
>
>I think I found the underlying problem!
>It was not the fec driver but the marvell phy driver, more specifically
>the marvell_of_reg_init call being made too late, which lead to the
>observed problem at half the boot ups where the link never came up.
>
>With the marvell,reg-init device tree parameter, a flag needs to be set
>to tell the Marvell 88E1510 that it should enable the interrupt output.
>(At a specific pin, in my case LED[2])
>If this is not set (or set too late), the phydev->state is set to UP in
>phy_start (called from fec_enet_open) but then, the auto-negotiation
>never starts.
>In comparison, now, after I called marvell_of_reg_init not in
>m88e1510_config_aneg but in marvell_probe, everything works again :)
>About a second after the fec_enet_open/phy_start calls, the
>auto-negotiation starts (m88e1510_config_aneg) and the phy state
>changes
>from UP to AN, to CHANGELINK and finally to RUNNING.
>
>I will send a patch shortly, calling marvell_of_reg_init from a new
>m88e1510_probe function instead of the m88e1510_config_aneg function.

config_init is more appropriate here since this call back will be called even if there is a software reset (e.g: from phy_init_hw). config_aneg is definitely too late, thanks for finding this!
diff mbox

Patch

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index bad3f00..903737a 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -901,11 +901,6 @@  int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
 
 	phydev->state = PHY_READY;
 
-	/* Initial carrier state is off as the phy is about to be
-	 * (re)initialized.
-	 */
-	netif_carrier_off(phydev->attached_dev);
-
 	/* Do initial configuration here, now that
 	 * we have certain key parameters
 	 * (dev_flags and interface)