diff mbox series

net: phy: realtek: regression, kernel null pointer dereference

Message ID 16f75ff4-e3e3-4d96-b084-e772e3ce1c2b@gmail.com
State RFC
Delegated to: David Miller
Headers show
Series net: phy: realtek: regression, kernel null pointer dereference | expand

Commit Message

Vicente Bergas May 10, 2019, 3:05 p.m. UTC
Hello,
there is a regression on linux v5.1-9573-gb970afcfcabd with a kernel null
pointer dereference.
The issue is the commit f81dadbcf7fd067baf184b63c179fc392bdb226e
  net: phy: realtek: Add rtl8211e rx/tx delays config
which uncovered a bug in phy-core when attempting to call
  phydev->drv->read_page
which can be null.
The patch to drivers/net/phy/phy-core.c below fixes the kernel null pointer
dereference. After applying the patch, there is still no network. I have
also tested the patch to drivers/net/phy/realtek.c, but no success. The
system hangs forever while initializing eth0.

Any suggestions?

Regards,
  Vicenç.

Comments

Heiner Kallweit May 10, 2019, 8:28 p.m. UTC | #1
On 10.05.2019 17:05, Vicente Bergas wrote:
> Hello,
> there is a regression on linux v5.1-9573-gb970afcfcabd with a kernel null
> pointer dereference.
> The issue is the commit f81dadbcf7fd067baf184b63c179fc392bdb226e
>  net: phy: realtek: Add rtl8211e rx/tx delays config
> which uncovered a bug in phy-core when attempting to call
>  phydev->drv->read_page
> which can be null.
> The patch to drivers/net/phy/phy-core.c below fixes the kernel null pointer
> dereference. After applying the patch, there is still no network. I have
> also tested the patch to drivers/net/phy/realtek.c, but no success. The
> system hangs forever while initializing eth0.
> 
> Any suggestions?
> 
The page operation callbacks are missing in the RTL8211E driver.
I just submitted a fix adding these callbacks to few Realtek PHY drivers
including RTl8211E. This should fix the issue.
Nevertheless your proposed patch looks good to me, just one small change
would be needed and it should be splitted.

The change to phy-core I would consider a fix and it should be fine to
submit it to net (net-next is closed currently).

Adding the warning to the Realtek driver is fine, but this would be
something for net-next once it's open again.

> Regards,
>  Vicenç.
> 
Heiner

> --- a/drivers/net/phy/phy-core.c
> +++ b/drivers/net/phy/phy-core.c
> @@ -648,11 +648,17 @@
> 
> static int __phy_read_page(struct phy_device *phydev)
> {
> +    if (!phydev->drv->read_page)
> +        return -EOPNOTSUPP;
> +   
>     return phydev->drv->read_page(phydev);
> }
> 
> static int __phy_write_page(struct phy_device *phydev, int page)
> {
> +    if (!phydev->drv->write_page)
> +        return -EOPNOTSUPP;
> +
>     return phydev->drv->write_page(phydev, page);
> }
> --- a/drivers/net/phy/realtek.c
> +++ b/drivers/net/phy/realtek.c
> @@ -214,8 +214,10 @@
>      * for details).
>      */
>     oldpage = phy_select_page(phydev, 0x7);
> -    if (oldpage < 0)
> -        goto err_restore_page;
> +    if (oldpage < 0) {
> +        dev_warn(&phydev->mdio.dev, "Unable to set rgmii delays\n");

Here phydev_warn() should be used.

> +        return 0;
> +    }
> 
>     ret = phy_write(phydev, RTL821x_EXT_PAGE_SELECT, 0xa4);
>     if (ret)
> 
>
Russell King (Oracle) May 10, 2019, 9:04 p.m. UTC | #2
On Fri, May 10, 2019 at 05:05:13PM +0200, Vicente Bergas wrote:
> Hello,
> there is a regression on linux v5.1-9573-gb970afcfcabd with a kernel null
> pointer dereference.
> The issue is the commit f81dadbcf7fd067baf184b63c179fc392bdb226e
>  net: phy: realtek: Add rtl8211e rx/tx delays config
> which uncovered a bug in phy-core when attempting to call
>  phydev->drv->read_page
> which can be null.
> The patch to drivers/net/phy/phy-core.c below fixes the kernel null pointer
> dereference. After applying the patch, there is still no network. I have
> also tested the patch to drivers/net/phy/realtek.c, but no success. The
> system hangs forever while initializing eth0.

You're not supposed to call these functions unless you provide the page
read/write page functions.  The fact that this code has crept in shows
that the patch adding the call to phy_select_page() in the realtek
driver was patently never tested, which, IMHO is bad software
engineering practice.  No, it's not even engineering practice, it's
an untested hack.

I don't see any point in adding run-time checks - that will only add
additional code, and we lose the backtrace.  The resulting oops from
trying to use these will give a backtrace and show exactly where the
problem is, including which driver is at fault.

The answer is... fix the driver to provide the required functions
before attempting to use an interface that requires said functions!

> 
> Any suggestions?
> 
> Regards,
>  Vicenç.
> 
> --- a/drivers/net/phy/phy-core.c
> +++ b/drivers/net/phy/phy-core.c
> @@ -648,11 +648,17 @@
> 
> static int __phy_read_page(struct phy_device *phydev)
> {
> +	if (!phydev->drv->read_page)
> +		return -EOPNOTSUPP;
> +	
> 	return phydev->drv->read_page(phydev);
> }
> 
> static int __phy_write_page(struct phy_device *phydev, int page)
> {
> +	if (!phydev->drv->write_page)
> +		return -EOPNOTSUPP;
> +
> 	return phydev->drv->write_page(phydev, page);
> }
> --- a/drivers/net/phy/realtek.c
> +++ b/drivers/net/phy/realtek.c
> @@ -214,8 +214,10 @@
> 	 * for details).
> 	 */
> 	oldpage = phy_select_page(phydev, 0x7);
> -	if (oldpage < 0)
> -		goto err_restore_page;
> +	if (oldpage < 0) {
> +		dev_warn(&phydev->mdio.dev, "Unable to set rgmii delays\n");
> +		return 0;
> +	}
> 
> 	ret = phy_write(phydev, RTL821x_EXT_PAGE_SELECT, 0xa4);
> 	if (ret)
> 
>
Vicente Bergas May 10, 2019, 9:18 p.m. UTC | #3
On Friday, May 10, 2019 10:28:06 PM CEST, Heiner Kallweit wrote:
> On 10.05.2019 17:05, Vicente Bergas wrote:
>> Hello,
>> there is a regression on linux v5.1-9573-gb970afcfcabd with a kernel null
>> pointer dereference.
>> The issue is the commit f81dadbcf7fd067baf184b63c179fc392bdb226e
>>  net: phy: realtek: Add rtl8211e rx/tx delays config ...
> The page operation callbacks are missing in the RTL8211E driver.
> I just submitted a fix adding these callbacks to few Realtek PHY drivers
> including RTl8211E. This should fix the issue.

Thanks for fixing it so fast!

> Nevertheless your proposed patch looks good to me, just one small change
> would be needed and it should be splitted.

The diff on the first mail was just to show which steps i did to debug the
issue, it was not intended to be applied.

Regards,
  Vicenç.

> The change to phy-core I would consider a fix and it should be fine to
> submit it to net (net-next is closed currently).
>
> Adding the warning to the Realtek driver is fine, but this would be
> something for net-next once it's open again.
>
>> Regards,
>>  Vicenç.
>> 
> Heiner
>
>> --- a/drivers/net/phy/phy-core.c
>> +++ b/drivers/net/phy/phy-core.c
>> @@ -648,11 +648,17 @@
>> 
>> static int __phy_read_page(struct phy_device *phydev)
>> { ...
>
> Here phydev_warn() should be used.
>
>> +        return 0;
>> +    }
>> 
>>     ret = phy_write(phydev, RTL821x_EXT_PAGE_SELECT, 0xa4);
>>     if (ret)
Vicente Bergas May 11, 2019, 2:46 p.m. UTC | #4
On Friday, May 10, 2019 10:28:06 PM CEST, Heiner Kallweit wrote:
> On 10.05.2019 17:05, Vicente Bergas wrote:
>> Hello,
>> there is a regression on linux v5.1-9573-gb970afcfcabd with a kernel null
>> pointer dereference.
>> The issue is the commit f81dadbcf7fd067baf184b63c179fc392bdb226e
>>  net: phy: realtek: Add rtl8211e rx/tx delays config ...
> The page operation callbacks are missing in the RTL8211E driver.
> I just submitted a fix adding these callbacks to few Realtek PHY drivers
> including RTl8211E. This should fix the issue.

Hello Heiner,
just tried your patch and indeed the NPE is gone. But still no network...
The MAC <-> PHY link was working before, so, maybe the rgmii delays are not
correctly configured.
With this change it is back to working:
--- a/drivers/net/phy/realtek.c
+++ b/drivers/net/phy/realtek.c
@@ -300,7 +300,6 @@
 	}, {
 		PHY_ID_MATCH_EXACT(0x001cc915),
 		.name		= "RTL8211E Gigabit Ethernet",
-		.config_init	= &rtl8211e_config_init,
 		.ack_interrupt	= &rtl821x_ack_interrupt,
 		.config_intr	= &rtl8211e_config_intr,
 		.suspend	= genphy_suspend,
That is basically reverting the patch.

Regards,
  Vicenç.

> Nevertheless your proposed patch looks good to me, just one small change
> would be needed and it should be splitted.
>
> The change to phy-core I would consider a fix and it should be fine to
> submit it to net (net-next is closed currently).
>
> Adding the warning to the Realtek driver is fine, but this would be
> something for net-next once it's open again.
>
>> Regards,
>>  Vicenç.
>> 
> Heiner
>
>> --- a/drivers/net/phy/phy-core.c
>> +++ b/drivers/net/phy/phy-core.c
>> @@ -648,11 +648,17 @@
>> 
>> static int __phy_read_page(struct phy_device *phydev)
>> { ...
>
> Here phydev_warn() should be used.
>
>> +        return 0;
>> +    }
>> 
>>     ret = phy_write(phydev, RTL821x_EXT_PAGE_SELECT, 0xa4);
>>     if (ret)
Heiner Kallweit May 11, 2019, 2:56 p.m. UTC | #5
On 11.05.2019 16:46, Vicente Bergas wrote:
> On Friday, May 10, 2019 10:28:06 PM CEST, Heiner Kallweit wrote:
>> On 10.05.2019 17:05, Vicente Bergas wrote:
>>> Hello,
>>> there is a regression on linux v5.1-9573-gb970afcfcabd with a kernel null
>>> pointer dereference.
>>> The issue is the commit f81dadbcf7fd067baf184b63c179fc392bdb226e
>>>  net: phy: realtek: Add rtl8211e rx/tx delays config ...
>> The page operation callbacks are missing in the RTL8211E driver.
>> I just submitted a fix adding these callbacks to few Realtek PHY drivers
>> including RTl8211E. This should fix the issue.
> 
> Hello Heiner,
> just tried your patch and indeed the NPE is gone. But still no network...
> The MAC <-> PHY link was working before, so, maybe the rgmii delays are not
> correctly configured.

That's a question to the author of the original patch. My patch was just
meant to fix the NPE. In which configuration are you using the RTL8211E?
As a standalone PHY (with which MAC/driver?) or is it the integrated PHY
in a member of the RTL8168 family?

Serge: The issue with the NPE gave a hint already that you didn't test your
patch. Was your patch based on an actual issue on some board and did you
test it? We may have to consider reverting the patch.

> With this change it is back to working:
> --- a/drivers/net/phy/realtek.c
> +++ b/drivers/net/phy/realtek.c
> @@ -300,7 +300,6 @@
>     }, {
>         PHY_ID_MATCH_EXACT(0x001cc915),
>         .name        = "RTL8211E Gigabit Ethernet",
> -        .config_init    = &rtl8211e_config_init,
>         .ack_interrupt    = &rtl821x_ack_interrupt,
>         .config_intr    = &rtl8211e_config_intr,
>         .suspend    = genphy_suspend,
> That is basically reverting the patch.
> 
> Regards,
>  Vicenç.
> 
>> Nevertheless your proposed patch looks good to me, just one small change
>> would be needed and it should be splitted.
>>
>> The change to phy-core I would consider a fix and it should be fine to
>> submit it to net (net-next is closed currently).
>>
>> Adding the warning to the Realtek driver is fine, but this would be
>> something for net-next once it's open again.
>>
>>> Regards,
>>>  Vicenç.
>>>
>> Heiner
>>
>>> --- a/drivers/net/phy/phy-core.c
>>> +++ b/drivers/net/phy/phy-core.c
>>> @@ -648,11 +648,17 @@
>>>
>>> static int __phy_read_page(struct phy_device *phydev)
>>> { ...
>>
>> Here phydev_warn() should be used.
>>
>>> +        return 0;
>>> +    }
>>>
>>>     ret = phy_write(phydev, RTL821x_EXT_PAGE_SELECT, 0xa4);
>>>     if (ret)
> 
>
Vicente Bergas May 11, 2019, 3:06 p.m. UTC | #6
On Saturday, May 11, 2019 4:56:56 PM CEST, Heiner Kallweit wrote:
> On 11.05.2019 16:46, Vicente Bergas wrote:
>> On Friday, May 10, 2019 10:28:06 PM CEST, Heiner Kallweit wrote:
>>> On 10.05.2019 17:05, Vicente Bergas wrote: ...
>> 
>> Hello Heiner,
>> just tried your patch and indeed the NPE is gone. But still no network...
>> The MAC <-> PHY link was working before, so, maybe the rgmii 
>> delays are not
>> correctly configured.
>
> That's a question to the author of the original patch. My patch was just
> meant to fix the NPE. In which configuration are you using the RTL8211E?
> As a standalone PHY (with which MAC/driver?) or is it the integrated PHY
> in a member of the RTL8168 family?

It is the one on the Sapphire board, so is connected to the MAC on the
RK3399 SoC. It is on page 8 of the schematics:
http://dl.vamrs.com/products/sapphire_excavator/RK_SAPPHIRE_SOCBOARD_RK3399_LPDDR3D178P232SD8_V12_20161109HXS.pdf

> Serge: The issue with the NPE gave a hint already that you didn't test your
> patch. Was your patch based on an actual issue on some board and did you
> test it? We may have to consider reverting the patch.
>
>> With this change it is back to working:
>> --- a/drivers/net/phy/realtek.c
>> +++ b/drivers/net/phy/realtek.c
>> @@ -300,7 +300,6 @@
>>     }, {
>>         PHY_ID_MATCH_EXACT(0x001cc915),
>>         .name        = "RTL8211E Gigabit Ethernet",
>> -        .config_init    = &rtl8211e_config_init,
>>         .ack_interrupt    = &rtl821x_ack_interrupt,
>>         .config_intr    = &rtl8211e_config_intr,
>>         .suspend    = genphy_suspend,
>> That is basically reverting the patch.
>> 
>> Regards,
>>  Vicenç.
>> 
>>> Nevertheless your proposed patch looks good to me, just one small change
>>> would be needed and it should be splitted.
>>> 
>>> The change to phy-core I would consider a fix and it should be fine to
>>> submit it to net (net-next is closed currently).
>>> 
>>> Adding the warning to the Realtek driver is fine, but this would be ...
Andrew Lunn May 11, 2019, 3:08 p.m. UTC | #7
On Sat, May 11, 2019 at 04:46:40PM +0200, Vicente Bergas wrote:
> On Friday, May 10, 2019 10:28:06 PM CEST, Heiner Kallweit wrote:
> >On 10.05.2019 17:05, Vicente Bergas wrote:
> >>Hello,
> >>there is a regression on linux v5.1-9573-gb970afcfcabd with a kernel null
> >>pointer dereference.
> >>The issue is the commit f81dadbcf7fd067baf184b63c179fc392bdb226e
> >> net: phy: realtek: Add rtl8211e rx/tx delays config ...
> >The page operation callbacks are missing in the RTL8211E driver.
> >I just submitted a fix adding these callbacks to few Realtek PHY drivers
> >including RTl8211E. This should fix the issue.
> 
> Hello Heiner,
> just tried your patch and indeed the NPE is gone. But still no network...
> The MAC <-> PHY link was working before, so, maybe the rgmii delays are not
> correctly configured.

Hi Vicente

What phy-mode do you have in device tree? Have you tried the others?

rmgii
rmgii-id
rmgii-rxid
rmgii-txid

	Andrew
Vicente Bergas May 11, 2019, 3:16 p.m. UTC | #8
On Saturday, May 11, 2019 5:08:19 PM CEST, Andrew Lunn wrote:
> On Sat, May 11, 2019 at 04:46:40PM +0200, Vicente Bergas wrote:
>> On Friday, May 10, 2019 10:28:06 PM CEST, Heiner Kallweit wrote:
>>> On 10.05.2019 17:05, Vicente Bergas wrote:
>>>> Hello,
>>>> there is a regression on linux v5.1-9573-gb970afcfcabd with 
>>>> a kernel null
>>>> pointer dereference.
>>>> The issue is the commit f81dadbcf7fd067baf184b63c179fc392bdb226e
>>>> net: phy: realtek: Add rtl8211e rx/tx delays config ...
>>> The page operation callbacks are missing in the RTL8211E driver.
>>> I just submitted a fix adding these callbacks to few Realtek PHY drivers
>>> including RTl8211E. This should fix the issue.
>> 
>> Hello Heiner,
>> just tried your patch and indeed the NPE is gone. But still no network...
>> The MAC <-> PHY link was working before, so, maybe the rgmii 
>> delays are not
>> correctly configured.
>
> Hi Vicente
>
> What phy-mode do you have in device tree? Have you tried the others?
>
> rmgii
> rmgii-id
> rmgii-rxid
> rmgii-txid
>
> 	Andrew

Hi Andrew,
it is configured as in the vanilla kernel:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/boot/dts/rockchip/rk3399-sapphire.dtsi#n191
,that is,
phy-mode = "rgmii";
There are also these configuration items:
tx_delay = <0x28>;
rx_delay = <0x11>;

Instead of going the trial-and-error way, please, can you suggest a
probably good configuration?

Thanks,
  Vicenç.
Andrew Lunn May 11, 2019, 3:25 p.m. UTC | #9
> Hi Andrew,
> it is configured as in the vanilla kernel:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/boot/dts/rockchip/rk3399-sapphire.dtsi#n191
> ,that is,
> phy-mode = "rgmii";
> There are also these configuration items:
> tx_delay = <0x28>;
> rx_delay = <0x11>;
> 
> Instead of going the trial-and-error way, please, can you suggest a
> probably good configuration?

I just found the same.

Interestingly, the device tree binding says:

Optional properties:
 - tx_delay: Delay value for TXD timing. Range value is 0~0x7F, 0x30 as default.
 - rx_delay: Delay value for RXD timing. Range value is 0~0x7F, 0x10 as default.

So it is not using quite the default values. But there is no
documentation about what these values mean. Given the difference of
0x20, it could be this is adding the needed TX delay, but not the RX
delay?

So you could try:

rgmii-rxid

And it is not clear what RX and TX mean, so also try

rgmii-txid.

in case they are swapped.

   Andrew
Serge Semin May 13, 2019, 7:29 a.m. UTC | #10
Hello Heiner and net-folks,

On Sat, May 11, 2019 at 04:56:56PM +0200, Heiner Kallweit wrote:
> On 11.05.2019 16:46, Vicente Bergas wrote:
> > On Friday, May 10, 2019 10:28:06 PM CEST, Heiner Kallweit wrote:
> >> On 10.05.2019 17:05, Vicente Bergas wrote:
> >>> Hello,
> >>> there is a regression on linux v5.1-9573-gb970afcfcabd with a kernel null
> >>> pointer dereference.
> >>> The issue is the commit f81dadbcf7fd067baf184b63c179fc392bdb226e
> >>>  net: phy: realtek: Add rtl8211e rx/tx delays config ...
> >> The page operation callbacks are missing in the RTL8211E driver.
> >> I just submitted a fix adding these callbacks to few Realtek PHY drivers
> >> including RTl8211E. This should fix the issue.
> > 
> > Hello Heiner,
> > just tried your patch and indeed the NPE is gone. But still no network...
> > The MAC <-> PHY link was working before, so, maybe the rgmii delays are not
> > correctly configured.
> 
> That's a question to the author of the original patch. My patch was just
> meant to fix the NPE. In which configuration are you using the RTL8211E?
> As a standalone PHY (with which MAC/driver?) or is it the integrated PHY
> in a member of the RTL8168 family?
> 
> Serge: The issue with the NPE gave a hint already that you didn't test your
> patch. Was your patch based on an actual issue on some board and did you
> test it? We may have to consider reverting the patch.
> 

I'm sorry for the problems the patch caused. My fault I couldn't predict the
paged-operations weren't defined for the E-revision of the PHY.

Regarding the patch tests. As I mention in the patchset discussions, the patch
was ported from earlier versions of the kernel. In particular I created it for
kernels 4.4 and 4.9, where paged-operations weren't introduced. So when I moved
it to the modern kernel I found the paged operations availability and decided to
use them, which simplified the code providing a ready-to-use interface to access
the PHY' extension pages. I also found they were defined in the driver with
"rtl821x_" prefix and mistakenly decided, that they were also used for any
rtl-like device. That's where I let the bug to creep in.

Regarding the MAC-PHY link. Without this functionality our custom board of
MAC and rtl8211e PHY doesn't provide a fully supported network, because the
RXDLY and TXDLY pins are grounded so there is no a simple way to set the
RGMII delays on the PHY side.

Concerning the MAC-PHY link problem Vincente found I'll respond to the
corresponding email in three hours.

-Sergey

> > With this change it is back to working:
> > --- a/drivers/net/phy/realtek.c
> > +++ b/drivers/net/phy/realtek.c
> > @@ -300,7 +300,6 @@
> >     }, {
> >         PHY_ID_MATCH_EXACT(0x001cc915),
> >         .name        = "RTL8211E Gigabit Ethernet",
> > -        .config_init    = &rtl8211e_config_init,
> >         .ack_interrupt    = &rtl821x_ack_interrupt,
> >         .config_intr    = &rtl8211e_config_intr,
> >         .suspend    = genphy_suspend,
> > That is basically reverting the patch.
> > 
> > Regards,
> >  Vicenç.
> > 
> >> Nevertheless your proposed patch looks good to me, just one small change
> >> would be needed and it should be splitted.
> >>
> >> The change to phy-core I would consider a fix and it should be fine to
> >> submit it to net (net-next is closed currently).
> >>
> >> Adding the warning to the Realtek driver is fine, but this would be
> >> something for net-next once it's open again.
> >>
> >>> Regards,
> >>>  Vicenç.
> >>>
> >> Heiner
> >>
> >>> --- a/drivers/net/phy/phy-core.c
> >>> +++ b/drivers/net/phy/phy-core.c
> >>> @@ -648,11 +648,17 @@
> >>>
> >>> static int __phy_read_page(struct phy_device *phydev)
> >>> { ...
> >>
> >> Here phydev_warn() should be used.
> >>
> >>> +        return 0;
> >>> +    }
> >>>
> >>>     ret = phy_write(phydev, RTL821x_EXT_PAGE_SELECT, 0xa4);
> >>>     if (ret)
> > 
> > 
>
Serge Semin May 13, 2019, 10:29 a.m. UTC | #11
Hello Vincente,

On Sat, May 11, 2019 at 05:06:04PM +0200, Vicente Bergas wrote:
> On Saturday, May 11, 2019 4:56:56 PM CEST, Heiner Kallweit wrote:
> > On 11.05.2019 16:46, Vicente Bergas wrote:
> > > On Friday, May 10, 2019 10:28:06 PM CEST, Heiner Kallweit wrote:
> > > > On 10.05.2019 17:05, Vicente Bergas wrote: ...
> > > 
> > > Hello Heiner,
> > > just tried your patch and indeed the NPE is gone. But still no network...
> > > The MAC <-> PHY link was working before, so, maybe the rgmii delays
> > > are not
> > > correctly configured.
> > 
> > That's a question to the author of the original patch. My patch was just
> > meant to fix the NPE. In which configuration are you using the RTL8211E?
> > As a standalone PHY (with which MAC/driver?) or is it the integrated PHY
> > in a member of the RTL8168 family?
> 
> It is the one on the Sapphire board, so is connected to the MAC on the
> RK3399 SoC. It is on page 8 of the schematics:
> http://dl.vamrs.com/products/sapphire_excavator/RK_SAPPHIRE_SOCBOARD_RK3399_LPDDR3D178P232SD8_V12_20161109HXS.pdf
> 

Thanks for sending this bug report.

As I said in the commit-message. The idea of this patch is to provide a way
to setup the RGMII delays in the PHY drivers (similar to the most of the PHY
drivers). Before this commit phy-mode dts-node hadn't been taked into account
by the PHY driver, so any PHY-delay setups provided via external pins strapping
were accepted as is. But now rtl8211e phy-mode is parsed as follows:
phy-mode="rgmii" - delays aren't set by PHY (current dts setting in rk3399-sapphire.dtsi)
phy-mode="rgmii-id" - both RX and TX delays are setup on the PHY side,
phy-mode="rgmii-rxid" - only RX delay is setup on the PHY side,
phy-mode="rgmii-txid" - only TX delay is setup on the PHY side.

It means, that now matter what the rtl8211e TXDLY/RXDLY pins are grounded or pulled
high, the delays are going to be setup in accordance with the dts phy-mode settings,
which is supposed to reflect the real hardware setup.

So since you get the problem with MAC<->PHY link, it means your dts-file didn't provide a
correct interface mode. Indeed seeing the sheet on page 7 in the sepphire pdf-file your
rtl8211e PHY is setup to have TXDLY/RXDLY being pulled high, which means to add 2ns delays
by the PHY. This setup corresponds to phy-mode="rgmii-id". As soon as you set it this way
in the rk3399 dts-file, the MAC-PHY link shall work correctly as before.

-Sergey

> > Serge: The issue with the NPE gave a hint already that you didn't test your
> > patch. Was your patch based on an actual issue on some board and did you
> > test it? We may have to consider reverting the patch.
> > 
> > > With this change it is back to working:
> > > --- a/drivers/net/phy/realtek.c
> > > +++ b/drivers/net/phy/realtek.c
> > > @@ -300,7 +300,6 @@
> > >     }, {
> > >         PHY_ID_MATCH_EXACT(0x001cc915),
> > >         .name        = "RTL8211E Gigabit Ethernet",
> > > -        .config_init    = &rtl8211e_config_init,
> > >         .ack_interrupt    = &rtl821x_ack_interrupt,
> > >         .config_intr    = &rtl8211e_config_intr,
> > >         .suspend    = genphy_suspend,
> > > That is basically reverting the patch.
> > > 
> > > Regards,
> > >  Vicenç.
> > > 
> > > > Nevertheless your proposed patch looks good to me, just one small change
> > > > would be needed and it should be splitted.
> > > > 
> > > > The change to phy-core I would consider a fix and it should be fine to
> > > > submit it to net (net-next is closed currently).
> > > > 
> > > > Adding the warning to the Realtek driver is fine, but this would be ...
>
Serge Semin May 13, 2019, 10:51 a.m. UTC | #12
On Mon, May 13, 2019 at 01:29:42PM +0300, Serge Semin wrote:
> Hello Vincente,
> 
> On Sat, May 11, 2019 at 05:06:04PM +0200, Vicente Bergas wrote:
> > On Saturday, May 11, 2019 4:56:56 PM CEST, Heiner Kallweit wrote:
> > > On 11.05.2019 16:46, Vicente Bergas wrote:
> > > > On Friday, May 10, 2019 10:28:06 PM CEST, Heiner Kallweit wrote:
> > > > > On 10.05.2019 17:05, Vicente Bergas wrote: ...
> > > > 
> > > > Hello Heiner,
> > > > just tried your patch and indeed the NPE is gone. But still no network...
> > > > The MAC <-> PHY link was working before, so, maybe the rgmii delays
> > > > are not
> > > > correctly configured.
> > > 
> > > That's a question to the author of the original patch. My patch was just
> > > meant to fix the NPE. In which configuration are you using the RTL8211E?
> > > As a standalone PHY (with which MAC/driver?) or is it the integrated PHY
> > > in a member of the RTL8168 family?
> > 
> > It is the one on the Sapphire board, so is connected to the MAC on the
> > RK3399 SoC. It is on page 8 of the schematics:
> > http://dl.vamrs.com/products/sapphire_excavator/RK_SAPPHIRE_SOCBOARD_RK3399_LPDDR3D178P232SD8_V12_20161109HXS.pdf
> > 
> 
> Thanks for sending this bug report.
> 
> As I said in the commit-message. The idea of this patch is to provide a way
> to setup the RGMII delays in the PHY drivers (similar to the most of the PHY
> drivers). Before this commit phy-mode dts-node hadn't been taked into account
> by the PHY driver, so any PHY-delay setups provided via external pins strapping
> were accepted as is. But now rtl8211e phy-mode is parsed as follows:
> phy-mode="rgmii" - delays aren't set by PHY (current dts setting in rk3399-sapphire.dtsi)
> phy-mode="rgmii-id" - both RX and TX delays are setup on the PHY side,
> phy-mode="rgmii-rxid" - only RX delay is setup on the PHY side,
> phy-mode="rgmii-txid" - only TX delay is setup on the PHY side.
> 
> It means, that now matter what the rtl8211e TXDLY/RXDLY pins are grounded or pulled
> high, the delays are going to be setup in accordance with the dts phy-mode settings,
> which is supposed to reflect the real hardware setup.
> 
> So since you get the problem with MAC<->PHY link, it means your dts-file didn't provide a
> correct interface mode. Indeed seeing the sheet on page 7 in the sepphire pdf-file your
> rtl8211e PHY is setup to have TXDLY/RXDLY being pulled high, which means to add 2ns delays
> by the PHY. This setup corresponds to phy-mode="rgmii-id". As soon as you set it this way
> in the rk3399 dts-file, the MAC-PHY link shall work correctly as before.
> 
> -Sergey

Hmm, just figured out, that in the datasheet RXDLY/TXDLY pins are actually grounded, so
phy-mode="rgmii" should work for you. Are you sure that on your actual hardware the
resistors aren't placed differently?

The current config register state can be read from the 0x1c extension page. Something
like this:
--- a/drivers/net/phy/realtek.c
+++ b/drivers/net/phy/realtek.c
@@ -221,6 +221,9 @@ static int rtl8211e_config_init(struct phy_device *phydev)
 	if (ret)
 		goto err_restore_page;
 
+	ret = phy_read(phydev, 0x1c);
+	dev_info(&phydev->mdio.dev, "PHY config register %08x\n", ret);
+
 	ret = phy_modify(phydev, 0x1c, RTL8211E_TX_DELAY | RTL8211E_RX_DELAY,
 			 val);
 
-Sergey

> 
> > > Serge: The issue with the NPE gave a hint already that you didn't test your
> > > patch. Was your patch based on an actual issue on some board and did you
> > > test it? We may have to consider reverting the patch.
> > > 
> > > > With this change it is back to working:
> > > > --- a/drivers/net/phy/realtek.c
> > > > +++ b/drivers/net/phy/realtek.c
> > > > @@ -300,7 +300,6 @@
> > > >     }, {
> > > >         PHY_ID_MATCH_EXACT(0x001cc915),
> > > >         .name        = "RTL8211E Gigabit Ethernet",
> > > > -        .config_init    = &rtl8211e_config_init,
> > > >         .ack_interrupt    = &rtl821x_ack_interrupt,
> > > >         .config_intr    = &rtl8211e_config_intr,
> > > >         .suspend    = genphy_suspend,
> > > > That is basically reverting the patch.
> > > > 
> > > > Regards,
> > > >  Vicenç.
> > > > 
> > > > > Nevertheless your proposed patch looks good to me, just one small change
> > > > > would be needed and it should be splitted.
> > > > > 
> > > > > The change to phy-core I would consider a fix and it should be fine to
> > > > > submit it to net (net-next is closed currently).
> > > > > 
> > > > > Adding the warning to the Realtek driver is fine, but this would be ...
> >
Vicente Bergas May 13, 2019, 12:19 p.m. UTC | #13
On Monday, May 13, 2019 12:51:05 PM CEST, Serge Semin wrote:
> On Mon, May 13, 2019 at 01:29:42PM +0300, Serge Semin wrote:
>> Hello Vincente,
>> 
>> On Sat, May 11, 2019 at 05:06:04PM +0200, Vicente Bergas wrote: ...
>
> Hmm, just figured out, that in the datasheet RXDLY/TXDLY pins 
> are actually grounded, so
> phy-mode="rgmii" should work for you. Are you sure that on your 
> actual hardware the
> resistors aren't placed differently?

That is correct, the schematic has pull-down resistors and placeholders for
pull-up resistors. On the board I can confirm the pull-ups are not
populated and the pull-downs are.
But the issue seems unrelated to this.

I have traced it down to a deadlock while trying to acquire a mutex.
It hangs here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/phy/realtek.c?id=47782361aca2#n220
while waiting for this mutex:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/phy/mdio_bus.c?id=47782361aca2#n692

Regards,
  Vicenç.

> The current config register state can be read from the 0x1c 
> extension page. Something
> like this:
> --- a/drivers/net/phy/realtek.c
> +++ b/drivers/net/phy/realtek.c
> @@ -221,6 +221,9 @@ static int rtl8211e_config_init(struct 
> phy_device *phydev)
>  	if (ret)
>  		goto err_restore_page;
>  
> +	ret = phy_read(phydev, 0x1c);
> +	dev_info(&phydev->mdio.dev, "PHY config register %08x\n", ret);
> +
>  	ret = phy_modify(phydev, 0x1c, RTL8211E_TX_DELAY | RTL8211E_RX_DELAY,
>  			 val);
>  
> -Sergey
Serge Semin May 13, 2019, 12:42 p.m. UTC | #14
Hello,

On Mon, May 13, 2019 at 02:19:17PM +0200, Vicente Bergas wrote:
> On Monday, May 13, 2019 12:51:05 PM CEST, Serge Semin wrote:
> > On Mon, May 13, 2019 at 01:29:42PM +0300, Serge Semin wrote:
> > > Hello Vincente,
> > > 
> > > On Sat, May 11, 2019 at 05:06:04PM +0200, Vicente Bergas wrote: ...
> > 
> > Hmm, just figured out, that in the datasheet RXDLY/TXDLY pins are
> > actually grounded, so
> > phy-mode="rgmii" should work for you. Are you sure that on your actual
> > hardware the
> > resistors aren't placed differently?
> 
> That is correct, the schematic has pull-down resistors and placeholders for
> pull-up resistors. On the board I can confirm the pull-ups are not
> populated and the pull-downs are.
> But the issue seems unrelated to this.
> 
> I have traced it down to a deadlock while trying to acquire a mutex.
> It hangs here:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/phy/realtek.c?id=47782361aca2#n220
> while waiting for this mutex:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/phy/mdio_bus.c?id=47782361aca2#n692
> 
> Regards,
>  Vicenç.
> 

Ahh, I see. Then using lock-less version of the access methods must fix the
problem. You could try something like this:
-------------------------------------------------------------------------------
diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c
index 761ce3b1e7bd..14b61da1f32a 100644
--- a/drivers/net/phy/realtek.c
+++ b/drivers/net/phy/realtek.c
@@ -217,12 +217,12 @@ static int rtl8211e_config_init(struct phy_device *phydev)
 	if (oldpage < 0)
 		goto err_restore_page;
 
-	ret = phy_write(phydev, RTL821x_EXT_PAGE_SELECT, 0xa4);
+	ret = __phy_write(phydev, RTL821x_EXT_PAGE_SELECT, 0xa4);
 	if (ret)
 		goto err_restore_page;
 
-	ret = phy_modify(phydev, 0x1c, RTL8211E_TX_DELAY | RTL8211E_RX_DELAY,
-			 val);
+	ret = __phy_modify(phydev, 0x1c, RTL8211E_TX_DELAY | RTL8211E_RX_DELAY,
+			   val);
 
 err_restore_page:
 	return phy_restore_page(phydev, oldpage, ret);
-------------------------------------------------------------------------------

-Sergey

> > The current config register state can be read from the 0x1c extension
> > page. Something
> > like this:
> > --- a/drivers/net/phy/realtek.c
> > +++ b/drivers/net/phy/realtek.c
> > @@ -221,6 +221,9 @@ static int rtl8211e_config_init(struct phy_device
> > *phydev)
> >  	if (ret)
> >  		goto err_restore_page;
> > +	ret = phy_read(phydev, 0x1c);
> > +	dev_info(&phydev->mdio.dev, "PHY config register %08x\n", ret);
> > +
> >  	ret = phy_modify(phydev, 0x1c, RTL8211E_TX_DELAY | RTL8211E_RX_DELAY,
> >  			 val);
> > -Sergey
>
Andrew Lunn May 13, 2019, 12:51 p.m. UTC | #15
> Ahh, I see. Then using lock-less version of the access methods must fix the
> problem. You could try something like this:

Kunihiko Hayash is way ahead of you.

	 Andrew
Serge Semin May 13, 2019, 1:01 p.m. UTC | #16
On Mon, May 13, 2019 at 02:51:03PM +0200, Andrew Lunn wrote:
> > Ahh, I see. Then using lock-less version of the access methods must fix the
> > problem. You could try something like this:
> 
> Kunihiko Hayash is way ahead of you.
> 
> 	 Andrew

I wouldn't say that five hours is "way ahead". But if something fixes a bug in
a patch it would be good to be have the original author being Cc'ed.

Vincente, here is a link to the patch, that fixes the problem.
https://lkml.org/lkml/2019/5/13/43

-Segey
Vicente Bergas May 13, 2019, 1:33 p.m. UTC | #17
On Monday, May 13, 2019 3:01:33 PM CEST, Serge Semin wrote:
> On Mon, May 13, 2019 at 02:51:03PM +0200, Andrew Lunn wrote:
>>> Ahh, I see. Then using lock-less version of the access 
>>> methods must fix the
>>> problem. You could try something like this:
>> 
>> Kunihiko Hayash is way ahead of you.
>> 
>> 	 Andrew
>
> I wouldn't say that five hours is "way ahead". But if something 
> fixes a bug in
> a patch it would be good to be have the original author being Cc'ed.
>
> Vincente, here is a link to the patch, that fixes the problem.
> https://lkml.org/lkml/2019/5/13/43
>
> -Segey

Tested and working OK now.
Thanks for the link.
Rgds.
diff mbox series

Patch

--- a/drivers/net/phy/phy-core.c
+++ b/drivers/net/phy/phy-core.c
@@ -648,11 +648,17 @@ 
 
 static int __phy_read_page(struct phy_device *phydev)
 {
+	if (!phydev->drv->read_page)
+		return -EOPNOTSUPP;
+	
 	return phydev->drv->read_page(phydev);
 }
 
 static int __phy_write_page(struct phy_device *phydev, int page)
 {
+	if (!phydev->drv->write_page)
+		return -EOPNOTSUPP;
+
 	return phydev->drv->write_page(phydev, page);
 }
--- a/drivers/net/phy/realtek.c
+++ b/drivers/net/phy/realtek.c
@@ -214,8 +214,10 @@ 
 	 * for details).
 	 */
 	oldpage = phy_select_page(phydev, 0x7);
-	if (oldpage < 0)
-		goto err_restore_page;
+	if (oldpage < 0) {
+		dev_warn(&phydev->mdio.dev, "Unable to set rgmii delays\n");
+		return 0;
+	}
 
 	ret = phy_write(phydev, RTL821x_EXT_PAGE_SELECT, 0xa4);
 	if (ret)