diff mbox series

[1/2] net: ti: am65-cpsw-nuss: Workaround for buggy PHY/Board

Message ID 20230822121350.51324-2-rogerq@kernel.org
State Changes Requested
Delegated to: Ramon Fried
Headers show
Series net: Fix Ethernet PHY detection on Beagleplay | expand

Commit Message

Roger Quadros Aug. 22, 2023, 12:13 p.m. UTC
Beagleplay has a buggy Ethernet PHY implementation for the Gigabit
PHY in the sense that it is non responsive over MDIO immediately
after power-up/reset.

We need to either try multiple times or wait sufficiently long enough
(couple of 10s of ms?) before the PHY begins to respond correctly.

One theory is that the PHY is configured to operate on MDIO address 0
which it treats as a special broadcast address.

Datasheet states:
"PHYAD (config pins) sets the PHY address for the device.
The RTL8211F(I)/RTL8211FD(I) supports PHY addresses from 0x01 to 0x07.
Note 1: An MDIO command with PHY address=0 is a broadcast from the MAC;
each PHY device should respond."

This issue is not seen with the other PHY (different make) on the board
which is configured for address 0x1.

As a woraround we try to probe the PHY multiple times instead of giving
up on the first attempt.

Signed-off-by: Roger Quadros <rogerq@kernel.org>
---
 drivers/net/ti/am65-cpsw-nuss.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

Comments

Siddharth Vadapalli Aug. 23, 2023, 4:35 a.m. UTC | #1
Roger,

On 22/08/23 17:43, Roger Quadros wrote:
> Beagleplay has a buggy Ethernet PHY implementation for the Gigabit
> PHY in the sense that it is non responsive over MDIO immediately
> after power-up/reset.
> 
> We need to either try multiple times or wait sufficiently long enough
> (couple of 10s of ms?) before the PHY begins to respond correctly.

Based on the datasheet at:
https://datasheet.lcsc.com/lcsc/1912111437_Realtek-Semicon-RTL8211F-CG_C187932.pdf
in the section 7.17. PHY Reset (Hardware Reset), it is stated that
software has to wait for at least 50 ms before accessing the PHY
registers. Is this why the for-loop in the code below tries for at most
5 times with a delay of 10 ms before the next try? If yes, then isn't it
better to move the delay into the realtek PHY driver at
drivers/net/phy/realtek.c
Shouldn't it be the PHY driver which ensures that any reads/writes to the PHY
registers are valid? It can ensure this by having a one time 50ms delay for the
very first access to the PHY registers.

> 
> One theory is that the PHY is configured to operate on MDIO address 0
> which it treats as a special broadcast address.
> 
> Datasheet states:
> "PHYAD (config pins) sets the PHY address for the device.
> The RTL8211F(I)/RTL8211FD(I) supports PHY addresses from 0x01 to 0x07.
> Note 1: An MDIO command with PHY address=0 is a broadcast from the MAC;
> each PHY device should respond."
> 
> This issue is not seen with the other PHY (different make) on the board
> which is configured for address 0x1.
> 
> As a woraround we try to probe the PHY multiple times instead of giving
> up on the first attempt.
> 
> Signed-off-by: Roger Quadros <rogerq@kernel.org>
> ---
>  drivers/net/ti/am65-cpsw-nuss.c | 19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ti/am65-cpsw-nuss.c b/drivers/net/ti/am65-cpsw-nuss.c
> index 51a8167d14..4f8a2f9b93 100644
> --- a/drivers/net/ti/am65-cpsw-nuss.c
> +++ b/drivers/net/ti/am65-cpsw-nuss.c
> @@ -27,6 +27,7 @@
>  #include <syscon.h>
>  #include <linux/bitops.h>
>  #include <linux/soc/ti/ti-udma.h>
> +#include <linux/delay.h>
>  
>  #include "cpsw_mdio.h"
>  
> @@ -678,14 +679,22 @@ static int am65_cpsw_phy_init(struct udevice *dev)
>  	struct am65_cpsw_priv *priv = dev_get_priv(dev);
>  	struct am65_cpsw_common *cpsw_common = priv->cpsw_common;
>  	struct eth_pdata *pdata = dev_get_plat(dev);
> -	struct phy_device *phydev;
>  	u32 supported = PHY_GBIT_FEATURES;
> +	struct phy_device *phydev;
> +	int tries;
>  	int ret;
>  
> -	phydev = phy_connect(cpsw_common->bus,
> -			     priv->phy_addr,
> -			     priv->dev,
> -			     pdata->phy_interface);
> +	/* Some boards (e.g. beagleplay) don't work on first go */
> +	for (tries = 1; tries <= 5; tries++) {
> +		phydev = phy_connect(cpsw_common->bus,
> +				     priv->phy_addr,
> +				     priv->dev,
> +				     pdata->phy_interface);
> +		if (phydev)
> +			break;
> +
> +		mdelay(10);
> +	}
>  
>  	if (!phydev) {
>  		dev_err(dev, "phy_connect() failed\n");
Roger Quadros Aug. 23, 2023, 7:52 a.m. UTC | #2
Hi Siddharth,

On 23/08/2023 07:35, Siddharth Vadapalli wrote:
> Roger,
> 
> On 22/08/23 17:43, Roger Quadros wrote:
>> Beagleplay has a buggy Ethernet PHY implementation for the Gigabit
>> PHY in the sense that it is non responsive over MDIO immediately
>> after power-up/reset.
>>
>> We need to either try multiple times or wait sufficiently long enough
>> (couple of 10s of ms?) before the PHY begins to respond correctly.
> 
> Based on the datasheet at:
> https://datasheet.lcsc.com/lcsc/1912111437_Realtek-Semicon-RTL8211F-CG_C187932.pdf
> in the section 7.17. PHY Reset (Hardware Reset), it is stated that
> software has to wait for at least 50 ms before accessing the PHY
> registers. Is this why the for-loop in the code below tries for at most
> 5 times with a delay of 10 ms before the next try? If yes, then isn't it

Good catch. This might be the reason why PHY is not responding in the first
few attempts.

> better to move the delay into the realtek PHY driver at
> drivers/net/phy/realtek.c

We are currently at the MDIO bus driver where we don't even know what PHY
is on the bus. So this delay cannot come at the realtec PHY driver.
It has to come at the MDIO bus driver level.

> Shouldn't it be the PHY driver which ensures that any reads/writes to the PHY
> registers are valid? It can ensure this by having a one time 50ms delay for the
> very first access to the PHY registers.> 
>>
>> One theory is that the PHY is configured to operate on MDIO address 0
>> which it treats as a special broadcast address.
>>
>> Datasheet states:
>> "PHYAD (config pins) sets the PHY address for the device.
>> The RTL8211F(I)/RTL8211FD(I) supports PHY addresses from 0x01 to 0x07.
>> Note 1: An MDIO command with PHY address=0 is a broadcast from the MAC;
>> each PHY device should respond."
>>
>> This issue is not seen with the other PHY (different make) on the board
>> which is configured for address 0x1.
>>
>> As a woraround we try to probe the PHY multiple times instead of giving
>> up on the first attempt.
>>
>> Signed-off-by: Roger Quadros <rogerq@kernel.org>
>> ---
>>  drivers/net/ti/am65-cpsw-nuss.c | 19 ++++++++++++++-----
>>  1 file changed, 14 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/net/ti/am65-cpsw-nuss.c b/drivers/net/ti/am65-cpsw-nuss.c
>> index 51a8167d14..4f8a2f9b93 100644
>> --- a/drivers/net/ti/am65-cpsw-nuss.c
>> +++ b/drivers/net/ti/am65-cpsw-nuss.c
>> @@ -27,6 +27,7 @@
>>  #include <syscon.h>
>>  #include <linux/bitops.h>
>>  #include <linux/soc/ti/ti-udma.h>
>> +#include <linux/delay.h>
>>  
>>  #include "cpsw_mdio.h"
>>  
>> @@ -678,14 +679,22 @@ static int am65_cpsw_phy_init(struct udevice *dev)
>>  	struct am65_cpsw_priv *priv = dev_get_priv(dev);
>>  	struct am65_cpsw_common *cpsw_common = priv->cpsw_common;
>>  	struct eth_pdata *pdata = dev_get_plat(dev);
>> -	struct phy_device *phydev;
>>  	u32 supported = PHY_GBIT_FEATURES;
>> +	struct phy_device *phydev;
>> +	int tries;
>>  	int ret;
>>  
>> -	phydev = phy_connect(cpsw_common->bus,
>> -			     priv->phy_addr,
>> -			     priv->dev,
>> -			     pdata->phy_interface);
>> +	/* Some boards (e.g. beagleplay) don't work on first go */
>> +	for (tries = 1; tries <= 5; tries++) {
>> +		phydev = phy_connect(cpsw_common->bus,
>> +				     priv->phy_addr,
>> +				     priv->dev,
>> +				     pdata->phy_interface);
>> +		if (phydev)
>> +			break;
>> +
>> +		mdelay(10);
>> +	}
>>  
>>  	if (!phydev) {
>>  		dev_err(dev, "phy_connect() failed\n");
>
Siddharth Vadapalli Aug. 23, 2023, 8:02 a.m. UTC | #3
On 23/08/23 13:22, Roger Quadros wrote:
> Hi Siddharth,
> 
> On 23/08/2023 07:35, Siddharth Vadapalli wrote:
>> Roger,
>>
>> On 22/08/23 17:43, Roger Quadros wrote:
>>> Beagleplay has a buggy Ethernet PHY implementation for the Gigabit
>>> PHY in the sense that it is non responsive over MDIO immediately
>>> after power-up/reset.
>>>
>>> We need to either try multiple times or wait sufficiently long enough
>>> (couple of 10s of ms?) before the PHY begins to respond correctly.
>>
>> Based on the datasheet at:
>> https://datasheet.lcsc.com/lcsc/1912111437_Realtek-Semicon-RTL8211F-CG_C187932.pdf
>> in the section 7.17. PHY Reset (Hardware Reset), it is stated that
>> software has to wait for at least 50 ms before accessing the PHY
>> registers. Is this why the for-loop in the code below tries for at most
>> 5 times with a delay of 10 ms before the next try? If yes, then isn't it
> 
> Good catch. This might be the reason why PHY is not responding in the first
> few attempts.
> 
>> better to move the delay into the realtek PHY driver at
>> drivers/net/phy/realtek.c
> 
> We are currently at the MDIO bus driver where we don't even know what PHY
> is on the bus. So this delay cannot come at the realtec PHY driver.
> It has to come at the MDIO bus driver level.

Thank you for clarifying.

> 
>> Shouldn't it be the PHY driver which ensures that any reads/writes to the PHY
>> registers are valid? It can ensure this by having a one time 50ms delay for the
>> very first access to the PHY registers.> 

...

>>
>
Roger Quadros Aug. 23, 2023, 8:14 a.m. UTC | #4
On 23/08/2023 11:02, Siddharth Vadapalli wrote:
> 
> 
> On 23/08/23 13:22, Roger Quadros wrote:
>> Hi Siddharth,
>>
>> On 23/08/2023 07:35, Siddharth Vadapalli wrote:
>>> Roger,
>>>
>>> On 22/08/23 17:43, Roger Quadros wrote:
>>>> Beagleplay has a buggy Ethernet PHY implementation for the Gigabit
>>>> PHY in the sense that it is non responsive over MDIO immediately
>>>> after power-up/reset.
>>>>
>>>> We need to either try multiple times or wait sufficiently long enough
>>>> (couple of 10s of ms?) before the PHY begins to respond correctly.
>>>
>>> Based on the datasheet at:
>>> https://datasheet.lcsc.com/lcsc/1912111437_Realtek-Semicon-RTL8211F-CG_C187932.pdf
>>> in the section 7.17. PHY Reset (Hardware Reset), it is stated that
>>> software has to wait for at least 50 ms before accessing the PHY
>>> registers. Is this why the for-loop in the code below tries for at most
>>> 5 times with a delay of 10 ms before the next try? If yes, then isn't it
>>
>> Good catch. This might be the reason why PHY is not responding in the first
>> few attempts.
>>
>>> better to move the delay into the realtek PHY driver at
>>> drivers/net/phy/realtek.c
>>
>> We are currently at the MDIO bus driver where we don't even know what PHY
>> is on the bus. So this delay cannot come at the realtec PHY driver.
>> It has to come at the MDIO bus driver level.
> 
> Thank you for clarifying.
> 

Looking closer, this is already being addressed by drivers/net/eth-phy-uclass.c
in eth_phy_pre_probe()

linux/Documentation/devicetree/bindings/net/ethernet-phy.yaml

  reset-assert-us:
    description:
      Delay after the reset was asserted in microseconds. If this
      property is missing the delay will be skipped.

  reset-deassert-us:
    description:
      Delay after the reset was deasserted in microseconds. If
      this property is missing the delay will be skipped.

So we can drop this patch once we implement proper uclass driver for cpsw-mdio.

>>
>>> Shouldn't it be the PHY driver which ensures that any reads/writes to the PHY
>>> registers are valid? It can ensure this by having a one time 50ms delay for the
>>> very first access to the PHY registers.> 
> 
> ...
> 
>>>
>>
>
Roger Quadros Aug. 23, 2023, 8:35 a.m. UTC | #5
On 23/08/2023 11:14, Roger Quadros wrote:
> 
> 
> On 23/08/2023 11:02, Siddharth Vadapalli wrote:
>>
>>
>> On 23/08/23 13:22, Roger Quadros wrote:
>>> Hi Siddharth,
>>>
>>> On 23/08/2023 07:35, Siddharth Vadapalli wrote:
>>>> Roger,
>>>>
>>>> On 22/08/23 17:43, Roger Quadros wrote:
>>>>> Beagleplay has a buggy Ethernet PHY implementation for the Gigabit
>>>>> PHY in the sense that it is non responsive over MDIO immediately
>>>>> after power-up/reset.
>>>>>
>>>>> We need to either try multiple times or wait sufficiently long enough
>>>>> (couple of 10s of ms?) before the PHY begins to respond correctly.
>>>>
>>>> Based on the datasheet at:
>>>> https://datasheet.lcsc.com/lcsc/1912111437_Realtek-Semicon-RTL8211F-CG_C187932.pdf
>>>> in the section 7.17. PHY Reset (Hardware Reset), it is stated that
>>>> software has to wait for at least 50 ms before accessing the PHY
>>>> registers. Is this why the for-loop in the code below tries for at most
>>>> 5 times with a delay of 10 ms before the next try? If yes, then isn't it
>>>

At power on, the constraint is even larger.
Please see Section 9.3 Power sequence.

PHY registers can be accessed at T2 which is sum of Rt3, Rt4, Rt5
which is about 146.5ms after Power Active and PHY Reset de-asserted.

>>> Good catch. This might be the reason why PHY is not responding in the first
>>> few attempts.
>>>
>>>> better to move the delay into the realtek PHY driver at
>>>> drivers/net/phy/realtek.c
>>>
>>> We are currently at the MDIO bus driver where we don't even know what PHY
>>> is on the bus. So this delay cannot come at the realtec PHY driver.
>>> It has to come at the MDIO bus driver level.
>>
>> Thank you for clarifying.
>>
> 
> Looking closer, this is already being addressed by drivers/net/eth-phy-uclass.c
> in eth_phy_pre_probe()
> 
> linux/Documentation/devicetree/bindings/net/ethernet-phy.yaml
> 
>   reset-assert-us:
>     description:
>       Delay after the reset was asserted in microseconds. If this
>       property is missing the delay will be skipped.
> 
>   reset-deassert-us:
>     description:
>       Delay after the reset was deasserted in microseconds. If
>       this property is missing the delay will be skipped.
> 
> So we can drop this patch once we implement proper uclass driver for cpsw-mdio.
> 
>>>
>>>> Shouldn't it be the PHY driver which ensures that any reads/writes to the PHY
>>>> registers are valid? It can ensure this by having a one time 50ms delay for the
>>>> very first access to the PHY registers.> 
>>
>> ...
>>
>>>>
>>>
>>
>
Siddharth Vadapalli Aug. 24, 2023, 6:04 p.m. UTC | #6
Hello Roger,

On 22-08-2023 17:43, Roger Quadros wrote:
> Beagleplay has a buggy Ethernet PHY implementation for the Gigabit
> PHY in the sense that it is non responsive over MDIO immediately
> after power-up/reset.
> 
> We need to either try multiple times or wait sufficiently long enough
> (couple of 10s of ms?) before the PHY begins to respond correctly.
> 
> One theory is that the PHY is configured to operate on MDIO address 0
> which it treats as a special broadcast address.
> 
> Datasheet states:
> "PHYAD (config pins) sets the PHY address for the device.
> The RTL8211F(I)/RTL8211FD(I) supports PHY addresses from 0x01 to 0x07.
> Note 1: An MDIO command with PHY address=0 is a broadcast from the MAC;
> each PHY device should respond."
> 
> This issue is not seen with the other PHY (different make) on the board
> which is configured for address 0x1.
> 
> As a woraround we try to probe the PHY multiple times instead of giving
> up on the first attempt.
> 
> Signed-off-by: Roger Quadros <rogerq@kernel.org>
> ---
>  drivers/net/ti/am65-cpsw-nuss.c | 19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ti/am65-cpsw-nuss.c b/drivers/net/ti/am65-cpsw-nuss.c
> index 51a8167d14..4f8a2f9b93 100644
> --- a/drivers/net/ti/am65-cpsw-nuss.c
> +++ b/drivers/net/ti/am65-cpsw-nuss.c
> @@ -27,6 +27,7 @@
>  #include <syscon.h>
>  #include <linux/bitops.h>
>  #include <linux/soc/ti/ti-udma.h>
> +#include <linux/delay.h>
>  
>  #include "cpsw_mdio.h"
>  
> @@ -678,14 +679,22 @@ static int am65_cpsw_phy_init(struct udevice *dev)
>  	struct am65_cpsw_priv *priv = dev_get_priv(dev);
>  	struct am65_cpsw_common *cpsw_common = priv->cpsw_common;
>  	struct eth_pdata *pdata = dev_get_plat(dev);
> -	struct phy_device *phydev;
>  	u32 supported = PHY_GBIT_FEATURES;
> +	struct phy_device *phydev;
> +	int tries;
>  	int ret;
>  
> -	phydev = phy_connect(cpsw_common->bus,
> -			     priv->phy_addr,
> -			     priv->dev,
> -			     pdata->phy_interface);
> +	/* Some boards (e.g. beagleplay) don't work on first go */
> +	for (tries = 1; tries <= 5; tries++) {
> +		phydev = phy_connect(cpsw_common->bus,
> +				     priv->phy_addr,
> +				     priv->dev,
> +				     pdata->phy_interface);
> +		if (phydev)
> +			break;
> +
> +		mdelay(10);
> +	}

After rethinking about the above implementation and the second patch of
this series, the second patch could be dropped altogether if the
following implementation is acceptable:

phydev = phy_connect(cpsw_common->bus,
		     priv->phy_addr,
		     priv->dev,
		     pdata->phy_interface);

if (!phydev) {
	/* Some boards (e.g. beagleplay) don't work on first go */
	mdelay(50);
	phydev = phy_connect(cpsw_common->bus,
			     priv->phy_addr,
			     priv->dev,
			     pdata->phy_interface);
}

if (!phydev) {
	dev_err(dev, "phy_connect() failed\n");
...

With this, there would be at most one "PHY not found" print, which
should be fine. The mdelay value of 50 could be replaced with a
sufficiently large value which guarantees success for Beagleplay.
Tom Rini Aug. 24, 2023, 6:24 p.m. UTC | #7
On Thu, Aug 24, 2023 at 11:34:29PM +0530, Siddharth Vadapalli wrote:
> Hello Roger,
> 
> On 22-08-2023 17:43, Roger Quadros wrote:
> > Beagleplay has a buggy Ethernet PHY implementation for the Gigabit
> > PHY in the sense that it is non responsive over MDIO immediately
> > after power-up/reset.
> > 
> > We need to either try multiple times or wait sufficiently long enough
> > (couple of 10s of ms?) before the PHY begins to respond correctly.
> > 
> > One theory is that the PHY is configured to operate on MDIO address 0
> > which it treats as a special broadcast address.
> > 
> > Datasheet states:
> > "PHYAD (config pins) sets the PHY address for the device.
> > The RTL8211F(I)/RTL8211FD(I) supports PHY addresses from 0x01 to 0x07.
> > Note 1: An MDIO command with PHY address=0 is a broadcast from the MAC;
> > each PHY device should respond."
> > 
> > This issue is not seen with the other PHY (different make) on the board
> > which is configured for address 0x1.
> > 
> > As a woraround we try to probe the PHY multiple times instead of giving
> > up on the first attempt.
> > 
> > Signed-off-by: Roger Quadros <rogerq@kernel.org>
> > ---
> >  drivers/net/ti/am65-cpsw-nuss.c | 19 ++++++++++++++-----
> >  1 file changed, 14 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/net/ti/am65-cpsw-nuss.c b/drivers/net/ti/am65-cpsw-nuss.c
> > index 51a8167d14..4f8a2f9b93 100644
> > --- a/drivers/net/ti/am65-cpsw-nuss.c
> > +++ b/drivers/net/ti/am65-cpsw-nuss.c
> > @@ -27,6 +27,7 @@
> >  #include <syscon.h>
> >  #include <linux/bitops.h>
> >  #include <linux/soc/ti/ti-udma.h>
> > +#include <linux/delay.h>
> >  
> >  #include "cpsw_mdio.h"
> >  
> > @@ -678,14 +679,22 @@ static int am65_cpsw_phy_init(struct udevice *dev)
> >  	struct am65_cpsw_priv *priv = dev_get_priv(dev);
> >  	struct am65_cpsw_common *cpsw_common = priv->cpsw_common;
> >  	struct eth_pdata *pdata = dev_get_plat(dev);
> > -	struct phy_device *phydev;
> >  	u32 supported = PHY_GBIT_FEATURES;
> > +	struct phy_device *phydev;
> > +	int tries;
> >  	int ret;
> >  
> > -	phydev = phy_connect(cpsw_common->bus,
> > -			     priv->phy_addr,
> > -			     priv->dev,
> > -			     pdata->phy_interface);
> > +	/* Some boards (e.g. beagleplay) don't work on first go */
> > +	for (tries = 1; tries <= 5; tries++) {
> > +		phydev = phy_connect(cpsw_common->bus,
> > +				     priv->phy_addr,
> > +				     priv->dev,
> > +				     pdata->phy_interface);
> > +		if (phydev)
> > +			break;
> > +
> > +		mdelay(10);
> > +	}
> 
> After rethinking about the above implementation and the second patch of
> this series, the second patch could be dropped altogether if the
> following implementation is acceptable:
> 
> phydev = phy_connect(cpsw_common->bus,
> 		     priv->phy_addr,
> 		     priv->dev,
> 		     pdata->phy_interface);
> 
> if (!phydev) {
> 	/* Some boards (e.g. beagleplay) don't work on first go */
> 	mdelay(50);
> 	phydev = phy_connect(cpsw_common->bus,
> 			     priv->phy_addr,
> 			     priv->dev,
> 			     pdata->phy_interface);
> }
> 
> if (!phydev) {
> 	dev_err(dev, "phy_connect() failed\n");
> ...
> 
> With this, there would be at most one "PHY not found" print, which
> should be fine. The mdelay value of 50 could be replaced with a
> sufficiently large value which guarantees success for Beagleplay.

Ramon, thoughts?
Roger Quadros Aug. 24, 2023, 7:09 p.m. UTC | #8
On 24/08/2023 21:24, Tom Rini wrote:
> On Thu, Aug 24, 2023 at 11:34:29PM +0530, Siddharth Vadapalli wrote:
>> Hello Roger,
>>
>> On 22-08-2023 17:43, Roger Quadros wrote:
>>> Beagleplay has a buggy Ethernet PHY implementation for the Gigabit
>>> PHY in the sense that it is non responsive over MDIO immediately
>>> after power-up/reset.
>>>
>>> We need to either try multiple times or wait sufficiently long enough
>>> (couple of 10s of ms?) before the PHY begins to respond correctly.
>>>
>>> One theory is that the PHY is configured to operate on MDIO address 0
>>> which it treats as a special broadcast address.
>>>
>>> Datasheet states:
>>> "PHYAD (config pins) sets the PHY address for the device.
>>> The RTL8211F(I)/RTL8211FD(I) supports PHY addresses from 0x01 to 0x07.
>>> Note 1: An MDIO command with PHY address=0 is a broadcast from the MAC;
>>> each PHY device should respond."
>>>
>>> This issue is not seen with the other PHY (different make) on the board
>>> which is configured for address 0x1.
>>>
>>> As a woraround we try to probe the PHY multiple times instead of giving
>>> up on the first attempt.
>>>
>>> Signed-off-by: Roger Quadros <rogerq@kernel.org>
>>> ---
>>>  drivers/net/ti/am65-cpsw-nuss.c | 19 ++++++++++++++-----
>>>  1 file changed, 14 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/net/ti/am65-cpsw-nuss.c b/drivers/net/ti/am65-cpsw-nuss.c
>>> index 51a8167d14..4f8a2f9b93 100644
>>> --- a/drivers/net/ti/am65-cpsw-nuss.c
>>> +++ b/drivers/net/ti/am65-cpsw-nuss.c
>>> @@ -27,6 +27,7 @@
>>>  #include <syscon.h>
>>>  #include <linux/bitops.h>
>>>  #include <linux/soc/ti/ti-udma.h>
>>> +#include <linux/delay.h>
>>>  
>>>  #include "cpsw_mdio.h"
>>>  
>>> @@ -678,14 +679,22 @@ static int am65_cpsw_phy_init(struct udevice *dev)
>>>  	struct am65_cpsw_priv *priv = dev_get_priv(dev);
>>>  	struct am65_cpsw_common *cpsw_common = priv->cpsw_common;
>>>  	struct eth_pdata *pdata = dev_get_plat(dev);
>>> -	struct phy_device *phydev;
>>>  	u32 supported = PHY_GBIT_FEATURES;
>>> +	struct phy_device *phydev;
>>> +	int tries;
>>>  	int ret;
>>>  
>>> -	phydev = phy_connect(cpsw_common->bus,
>>> -			     priv->phy_addr,
>>> -			     priv->dev,
>>> -			     pdata->phy_interface);
>>> +	/* Some boards (e.g. beagleplay) don't work on first go */
>>> +	for (tries = 1; tries <= 5; tries++) {
>>> +		phydev = phy_connect(cpsw_common->bus,
>>> +				     priv->phy_addr,
>>> +				     priv->dev,
>>> +				     pdata->phy_interface);
>>> +		if (phydev)
>>> +			break;
>>> +
>>> +		mdelay(10);
>>> +	}
>>
>> After rethinking about the above implementation and the second patch of
>> this series, the second patch could be dropped altogether if the
>> following implementation is acceptable:
>>
>> phydev = phy_connect(cpsw_common->bus,
>> 		     priv->phy_addr,
>> 		     priv->dev,
>> 		     pdata->phy_interface);
>>
>> if (!phydev) {
>> 	/* Some boards (e.g. beagleplay) don't work on first go */
>> 	mdelay(50);
>> 	phydev = phy_connect(cpsw_common->bus,
>> 			     priv->phy_addr,
>> 			     priv->dev,
>> 			     pdata->phy_interface);
>> }
>>
>> if (!phydev) {
>> 	dev_err(dev, "phy_connect() failed\n");
>> ...
>>
>> With this, there would be at most one "PHY not found" print, which
>> should be fine. The mdelay value of 50 could be replaced with a
>> sufficiently large value which guarantees success for Beagleplay.
> 
> Ramon, thoughts?
> 

Even a single "PHY not found" print is not OK. It looks like an
error while it should not.

The correct solution is to use the MDIO uclass framework and add
some generic handling in the class driver. drivers/net/eth-phy-uclass.c

We could provide the delay time in the 'reset-deassert-us' property.
Although I'm not sure if this is the correct property for this case
as there is no RESET GPIO on the board. What we really want is delay
from power-on-reset. Which means we might have to introduce a new property
and use time from boot to determine if PHY is ready or not?

NOTE: PHY ready time is different for hardware reset and power-on-reset.
50ms vs 150ms
Siddharth Vadapalli Aug. 25, 2023, 5:42 a.m. UTC | #9
Roger,

On 25/08/23 00:39, Roger Quadros wrote:
> 
> 
> On 24/08/2023 21:24, Tom Rini wrote:
>> On Thu, Aug 24, 2023 at 11:34:29PM +0530, Siddharth Vadapalli wrote:
>>> Hello Roger,
>>>
>>> On 22-08-2023 17:43, Roger Quadros wrote:

...

> 
> Even a single "PHY not found" print is not OK. It looks like an
> error while it should not.
> 
> The correct solution is to use the MDIO uclass framework and add
> some generic handling in the class driver. drivers/net/eth-phy-uclass.c
> 
> We could provide the delay time in the 'reset-deassert-us' property.
> Although I'm not sure if this is the correct property for this case
> as there is no RESET GPIO on the board. What we really want is delay
> from power-on-reset. Which means we might have to introduce a new property
> and use time from boot to determine if PHY is ready or not?
> 
> NOTE: PHY ready time is different for hardware reset and power-on-reset.
> 50ms vs 150ms

Another alternative could be that of implementing the for-loop within
phy_connect() along with the delay, by updating the function with a new
parameter "tries" which indicates the number of retries before finally printing
"PHY not found" in case of an error. Optionally, another parameter "delay" can
be added, which indicates the delay within the for-loop.

This implementation will require updating all drivers in drivers/net which use
phy_connect(), with the "tries" parameter set to 1 for them. The am65-cpsw-nuss
driver can set "tries" to 5 as done in the current patch.

The idea behind moving the for-loop within phy_connect() is that it will help
solve the issue for other drivers as well, if they potentially encounter such
buggy PHYs in future boards.

>
Roger Quadros Aug. 25, 2023, 7:52 a.m. UTC | #10
Siddharth,

On 25/08/2023 08:42, Siddharth Vadapalli wrote:
> Roger,
> 
> On 25/08/23 00:39, Roger Quadros wrote:
>>
>>
>> On 24/08/2023 21:24, Tom Rini wrote:
>>> On Thu, Aug 24, 2023 at 11:34:29PM +0530, Siddharth Vadapalli wrote:
>>>> Hello Roger,
>>>>
>>>> On 22-08-2023 17:43, Roger Quadros wrote:
> 
> ...
> 
>>
>> Even a single "PHY not found" print is not OK. It looks like an
>> error while it should not.
>>
>> The correct solution is to use the MDIO uclass framework and add
>> some generic handling in the class driver. drivers/net/eth-phy-uclass.c
>>
>> We could provide the delay time in the 'reset-deassert-us' property.
>> Although I'm not sure if this is the correct property for this case
>> as there is no RESET GPIO on the board. What we really want is delay
>> from power-on-reset. Which means we might have to introduce a new property
>> and use time from boot to determine if PHY is ready or not?
>>
>> NOTE: PHY ready time is different for hardware reset and power-on-reset.
>> 50ms vs 150ms
> 
> Another alternative could be that of implementing the for-loop within
> phy_connect() along with the delay, by updating the function with a new
> parameter "tries" which indicates the number of retries before finally printing
> "PHY not found" in case of an error. Optionally, another parameter "delay" can
> be added, which indicates the delay within the for-loop.
> 
> This implementation will require updating all drivers in drivers/net which use
> phy_connect(), with the "tries" parameter set to 1 for them. The am65-cpsw-nuss
> driver can set "tries" to 5 as done in the current patch.
> 
> The idea behind moving the for-loop within phy_connect() is that it will help
> solve the issue for other drivers as well, if they potentially encounter such
> buggy PHYs in future boards.
> 

This doesn't sound like a clean solution.
All hardware specific details (like reset ready-time) should come from the
device tree and not passed around in PHY APIs.

Why to do such an intrusive change instead of fixing the am65-cpsw-nuss driver
to properly use the PHY uclass driver model and encode the required delays
in the device tree?
Siddharth Vadapalli Aug. 25, 2023, 8:22 a.m. UTC | #11
On 25/08/23 13:22, Roger Quadros wrote:
> Siddharth,
> 
> On 25/08/2023 08:42, Siddharth Vadapalli wrote:
>> Roger,
>>
>> On 25/08/23 00:39, Roger Quadros wrote:
>>>
>>>
>>> On 24/08/2023 21:24, Tom Rini wrote:
>>>> On Thu, Aug 24, 2023 at 11:34:29PM +0530, Siddharth Vadapalli wrote:
>>>>> Hello Roger,
>>>>>
>>>>> On 22-08-2023 17:43, Roger Quadros wrote:

...

>>
> 
> This doesn't sound like a clean solution.
> All hardware specific details (like reset ready-time) should come from the
> device tree and not passed around in PHY APIs.
> 
> Why to do such an intrusive change instead of fixing the am65-cpsw-nuss driver
> to properly use the PHY uclass driver model and encode the required delays
> in the device tree?

Encoding delays in device-tree and using them is better. Kindly ignore my
suggestion.

>
diff mbox series

Patch

diff --git a/drivers/net/ti/am65-cpsw-nuss.c b/drivers/net/ti/am65-cpsw-nuss.c
index 51a8167d14..4f8a2f9b93 100644
--- a/drivers/net/ti/am65-cpsw-nuss.c
+++ b/drivers/net/ti/am65-cpsw-nuss.c
@@ -27,6 +27,7 @@ 
 #include <syscon.h>
 #include <linux/bitops.h>
 #include <linux/soc/ti/ti-udma.h>
+#include <linux/delay.h>
 
 #include "cpsw_mdio.h"
 
@@ -678,14 +679,22 @@  static int am65_cpsw_phy_init(struct udevice *dev)
 	struct am65_cpsw_priv *priv = dev_get_priv(dev);
 	struct am65_cpsw_common *cpsw_common = priv->cpsw_common;
 	struct eth_pdata *pdata = dev_get_plat(dev);
-	struct phy_device *phydev;
 	u32 supported = PHY_GBIT_FEATURES;
+	struct phy_device *phydev;
+	int tries;
 	int ret;
 
-	phydev = phy_connect(cpsw_common->bus,
-			     priv->phy_addr,
-			     priv->dev,
-			     pdata->phy_interface);
+	/* Some boards (e.g. beagleplay) don't work on first go */
+	for (tries = 1; tries <= 5; tries++) {
+		phydev = phy_connect(cpsw_common->bus,
+				     priv->phy_addr,
+				     priv->dev,
+				     pdata->phy_interface);
+		if (phydev)
+			break;
+
+		mdelay(10);
+	}
 
 	if (!phydev) {
 		dev_err(dev, "phy_connect() failed\n");