Message ID | 16f75ff4-e3e3-4d96-b084-e772e3ce1c2b@gmail.com |
---|---|
State | RFC |
Delegated to: | David Miller |
Headers | show |
Series | net: phy: realtek: regression, kernel null pointer dereference | expand |
On 10.05.2019 17:05, Vicente Bergas wrote: > Hello, > there is a regression on linux v5.1-9573-gb970afcfcabd with a kernel null > pointer dereference. > The issue is the commit f81dadbcf7fd067baf184b63c179fc392bdb226e > net: phy: realtek: Add rtl8211e rx/tx delays config > which uncovered a bug in phy-core when attempting to call > phydev->drv->read_page > which can be null. > The patch to drivers/net/phy/phy-core.c below fixes the kernel null pointer > dereference. After applying the patch, there is still no network. I have > also tested the patch to drivers/net/phy/realtek.c, but no success. The > system hangs forever while initializing eth0. > > Any suggestions? > The page operation callbacks are missing in the RTL8211E driver. I just submitted a fix adding these callbacks to few Realtek PHY drivers including RTl8211E. This should fix the issue. Nevertheless your proposed patch looks good to me, just one small change would be needed and it should be splitted. The change to phy-core I would consider a fix and it should be fine to submit it to net (net-next is closed currently). Adding the warning to the Realtek driver is fine, but this would be something for net-next once it's open again. > Regards, > Vicenç. > Heiner > --- a/drivers/net/phy/phy-core.c > +++ b/drivers/net/phy/phy-core.c > @@ -648,11 +648,17 @@ > > static int __phy_read_page(struct phy_device *phydev) > { > + if (!phydev->drv->read_page) > + return -EOPNOTSUPP; > + > return phydev->drv->read_page(phydev); > } > > static int __phy_write_page(struct phy_device *phydev, int page) > { > + if (!phydev->drv->write_page) > + return -EOPNOTSUPP; > + > return phydev->drv->write_page(phydev, page); > } > --- a/drivers/net/phy/realtek.c > +++ b/drivers/net/phy/realtek.c > @@ -214,8 +214,10 @@ > * for details). > */ > oldpage = phy_select_page(phydev, 0x7); > - if (oldpage < 0) > - goto err_restore_page; > + if (oldpage < 0) { > + dev_warn(&phydev->mdio.dev, "Unable to set rgmii delays\n"); Here phydev_warn() should be used. > + return 0; > + } > > ret = phy_write(phydev, RTL821x_EXT_PAGE_SELECT, 0xa4); > if (ret) > >
On Fri, May 10, 2019 at 05:05:13PM +0200, Vicente Bergas wrote: > Hello, > there is a regression on linux v5.1-9573-gb970afcfcabd with a kernel null > pointer dereference. > The issue is the commit f81dadbcf7fd067baf184b63c179fc392bdb226e > net: phy: realtek: Add rtl8211e rx/tx delays config > which uncovered a bug in phy-core when attempting to call > phydev->drv->read_page > which can be null. > The patch to drivers/net/phy/phy-core.c below fixes the kernel null pointer > dereference. After applying the patch, there is still no network. I have > also tested the patch to drivers/net/phy/realtek.c, but no success. The > system hangs forever while initializing eth0. You're not supposed to call these functions unless you provide the page read/write page functions. The fact that this code has crept in shows that the patch adding the call to phy_select_page() in the realtek driver was patently never tested, which, IMHO is bad software engineering practice. No, it's not even engineering practice, it's an untested hack. I don't see any point in adding run-time checks - that will only add additional code, and we lose the backtrace. The resulting oops from trying to use these will give a backtrace and show exactly where the problem is, including which driver is at fault. The answer is... fix the driver to provide the required functions before attempting to use an interface that requires said functions! > > Any suggestions? > > Regards, > Vicenç. > > --- a/drivers/net/phy/phy-core.c > +++ b/drivers/net/phy/phy-core.c > @@ -648,11 +648,17 @@ > > static int __phy_read_page(struct phy_device *phydev) > { > + if (!phydev->drv->read_page) > + return -EOPNOTSUPP; > + > return phydev->drv->read_page(phydev); > } > > static int __phy_write_page(struct phy_device *phydev, int page) > { > + if (!phydev->drv->write_page) > + return -EOPNOTSUPP; > + > return phydev->drv->write_page(phydev, page); > } > --- a/drivers/net/phy/realtek.c > +++ b/drivers/net/phy/realtek.c > @@ -214,8 +214,10 @@ > * for details). > */ > oldpage = phy_select_page(phydev, 0x7); > - if (oldpage < 0) > - goto err_restore_page; > + if (oldpage < 0) { > + dev_warn(&phydev->mdio.dev, "Unable to set rgmii delays\n"); > + return 0; > + } > > ret = phy_write(phydev, RTL821x_EXT_PAGE_SELECT, 0xa4); > if (ret) > >
On Friday, May 10, 2019 10:28:06 PM CEST, Heiner Kallweit wrote: > On 10.05.2019 17:05, Vicente Bergas wrote: >> Hello, >> there is a regression on linux v5.1-9573-gb970afcfcabd with a kernel null >> pointer dereference. >> The issue is the commit f81dadbcf7fd067baf184b63c179fc392bdb226e >> net: phy: realtek: Add rtl8211e rx/tx delays config ... > The page operation callbacks are missing in the RTL8211E driver. > I just submitted a fix adding these callbacks to few Realtek PHY drivers > including RTl8211E. This should fix the issue. Thanks for fixing it so fast! > Nevertheless your proposed patch looks good to me, just one small change > would be needed and it should be splitted. The diff on the first mail was just to show which steps i did to debug the issue, it was not intended to be applied. Regards, Vicenç. > The change to phy-core I would consider a fix and it should be fine to > submit it to net (net-next is closed currently). > > Adding the warning to the Realtek driver is fine, but this would be > something for net-next once it's open again. > >> Regards, >> Vicenç. >> > Heiner > >> --- a/drivers/net/phy/phy-core.c >> +++ b/drivers/net/phy/phy-core.c >> @@ -648,11 +648,17 @@ >> >> static int __phy_read_page(struct phy_device *phydev) >> { ... > > Here phydev_warn() should be used. > >> + return 0; >> + } >> >> ret = phy_write(phydev, RTL821x_EXT_PAGE_SELECT, 0xa4); >> if (ret)
On Friday, May 10, 2019 10:28:06 PM CEST, Heiner Kallweit wrote: > On 10.05.2019 17:05, Vicente Bergas wrote: >> Hello, >> there is a regression on linux v5.1-9573-gb970afcfcabd with a kernel null >> pointer dereference. >> The issue is the commit f81dadbcf7fd067baf184b63c179fc392bdb226e >> net: phy: realtek: Add rtl8211e rx/tx delays config ... > The page operation callbacks are missing in the RTL8211E driver. > I just submitted a fix adding these callbacks to few Realtek PHY drivers > including RTl8211E. This should fix the issue. Hello Heiner, just tried your patch and indeed the NPE is gone. But still no network... The MAC <-> PHY link was working before, so, maybe the rgmii delays are not correctly configured. With this change it is back to working: --- a/drivers/net/phy/realtek.c +++ b/drivers/net/phy/realtek.c @@ -300,7 +300,6 @@ }, { PHY_ID_MATCH_EXACT(0x001cc915), .name = "RTL8211E Gigabit Ethernet", - .config_init = &rtl8211e_config_init, .ack_interrupt = &rtl821x_ack_interrupt, .config_intr = &rtl8211e_config_intr, .suspend = genphy_suspend, That is basically reverting the patch. Regards, Vicenç. > Nevertheless your proposed patch looks good to me, just one small change > would be needed and it should be splitted. > > The change to phy-core I would consider a fix and it should be fine to > submit it to net (net-next is closed currently). > > Adding the warning to the Realtek driver is fine, but this would be > something for net-next once it's open again. > >> Regards, >> Vicenç. >> > Heiner > >> --- a/drivers/net/phy/phy-core.c >> +++ b/drivers/net/phy/phy-core.c >> @@ -648,11 +648,17 @@ >> >> static int __phy_read_page(struct phy_device *phydev) >> { ... > > Here phydev_warn() should be used. > >> + return 0; >> + } >> >> ret = phy_write(phydev, RTL821x_EXT_PAGE_SELECT, 0xa4); >> if (ret)
On 11.05.2019 16:46, Vicente Bergas wrote: > On Friday, May 10, 2019 10:28:06 PM CEST, Heiner Kallweit wrote: >> On 10.05.2019 17:05, Vicente Bergas wrote: >>> Hello, >>> there is a regression on linux v5.1-9573-gb970afcfcabd with a kernel null >>> pointer dereference. >>> The issue is the commit f81dadbcf7fd067baf184b63c179fc392bdb226e >>> net: phy: realtek: Add rtl8211e rx/tx delays config ... >> The page operation callbacks are missing in the RTL8211E driver. >> I just submitted a fix adding these callbacks to few Realtek PHY drivers >> including RTl8211E. This should fix the issue. > > Hello Heiner, > just tried your patch and indeed the NPE is gone. But still no network... > The MAC <-> PHY link was working before, so, maybe the rgmii delays are not > correctly configured. That's a question to the author of the original patch. My patch was just meant to fix the NPE. In which configuration are you using the RTL8211E? As a standalone PHY (with which MAC/driver?) or is it the integrated PHY in a member of the RTL8168 family? Serge: The issue with the NPE gave a hint already that you didn't test your patch. Was your patch based on an actual issue on some board and did you test it? We may have to consider reverting the patch. > With this change it is back to working: > --- a/drivers/net/phy/realtek.c > +++ b/drivers/net/phy/realtek.c > @@ -300,7 +300,6 @@ > }, { > PHY_ID_MATCH_EXACT(0x001cc915), > .name = "RTL8211E Gigabit Ethernet", > - .config_init = &rtl8211e_config_init, > .ack_interrupt = &rtl821x_ack_interrupt, > .config_intr = &rtl8211e_config_intr, > .suspend = genphy_suspend, > That is basically reverting the patch. > > Regards, > Vicenç. > >> Nevertheless your proposed patch looks good to me, just one small change >> would be needed and it should be splitted. >> >> The change to phy-core I would consider a fix and it should be fine to >> submit it to net (net-next is closed currently). >> >> Adding the warning to the Realtek driver is fine, but this would be >> something for net-next once it's open again. >> >>> Regards, >>> Vicenç. >>> >> Heiner >> >>> --- a/drivers/net/phy/phy-core.c >>> +++ b/drivers/net/phy/phy-core.c >>> @@ -648,11 +648,17 @@ >>> >>> static int __phy_read_page(struct phy_device *phydev) >>> { ... >> >> Here phydev_warn() should be used. >> >>> + return 0; >>> + } >>> >>> ret = phy_write(phydev, RTL821x_EXT_PAGE_SELECT, 0xa4); >>> if (ret) > >
On Saturday, May 11, 2019 4:56:56 PM CEST, Heiner Kallweit wrote: > On 11.05.2019 16:46, Vicente Bergas wrote: >> On Friday, May 10, 2019 10:28:06 PM CEST, Heiner Kallweit wrote: >>> On 10.05.2019 17:05, Vicente Bergas wrote: ... >> >> Hello Heiner, >> just tried your patch and indeed the NPE is gone. But still no network... >> The MAC <-> PHY link was working before, so, maybe the rgmii >> delays are not >> correctly configured. > > That's a question to the author of the original patch. My patch was just > meant to fix the NPE. In which configuration are you using the RTL8211E? > As a standalone PHY (with which MAC/driver?) or is it the integrated PHY > in a member of the RTL8168 family? It is the one on the Sapphire board, so is connected to the MAC on the RK3399 SoC. It is on page 8 of the schematics: http://dl.vamrs.com/products/sapphire_excavator/RK_SAPPHIRE_SOCBOARD_RK3399_LPDDR3D178P232SD8_V12_20161109HXS.pdf > Serge: The issue with the NPE gave a hint already that you didn't test your > patch. Was your patch based on an actual issue on some board and did you > test it? We may have to consider reverting the patch. > >> With this change it is back to working: >> --- a/drivers/net/phy/realtek.c >> +++ b/drivers/net/phy/realtek.c >> @@ -300,7 +300,6 @@ >> }, { >> PHY_ID_MATCH_EXACT(0x001cc915), >> .name = "RTL8211E Gigabit Ethernet", >> - .config_init = &rtl8211e_config_init, >> .ack_interrupt = &rtl821x_ack_interrupt, >> .config_intr = &rtl8211e_config_intr, >> .suspend = genphy_suspend, >> That is basically reverting the patch. >> >> Regards, >> Vicenç. >> >>> Nevertheless your proposed patch looks good to me, just one small change >>> would be needed and it should be splitted. >>> >>> The change to phy-core I would consider a fix and it should be fine to >>> submit it to net (net-next is closed currently). >>> >>> Adding the warning to the Realtek driver is fine, but this would be ...
On Sat, May 11, 2019 at 04:46:40PM +0200, Vicente Bergas wrote: > On Friday, May 10, 2019 10:28:06 PM CEST, Heiner Kallweit wrote: > >On 10.05.2019 17:05, Vicente Bergas wrote: > >>Hello, > >>there is a regression on linux v5.1-9573-gb970afcfcabd with a kernel null > >>pointer dereference. > >>The issue is the commit f81dadbcf7fd067baf184b63c179fc392bdb226e > >> net: phy: realtek: Add rtl8211e rx/tx delays config ... > >The page operation callbacks are missing in the RTL8211E driver. > >I just submitted a fix adding these callbacks to few Realtek PHY drivers > >including RTl8211E. This should fix the issue. > > Hello Heiner, > just tried your patch and indeed the NPE is gone. But still no network... > The MAC <-> PHY link was working before, so, maybe the rgmii delays are not > correctly configured. Hi Vicente What phy-mode do you have in device tree? Have you tried the others? rmgii rmgii-id rmgii-rxid rmgii-txid Andrew
On Saturday, May 11, 2019 5:08:19 PM CEST, Andrew Lunn wrote: > On Sat, May 11, 2019 at 04:46:40PM +0200, Vicente Bergas wrote: >> On Friday, May 10, 2019 10:28:06 PM CEST, Heiner Kallweit wrote: >>> On 10.05.2019 17:05, Vicente Bergas wrote: >>>> Hello, >>>> there is a regression on linux v5.1-9573-gb970afcfcabd with >>>> a kernel null >>>> pointer dereference. >>>> The issue is the commit f81dadbcf7fd067baf184b63c179fc392bdb226e >>>> net: phy: realtek: Add rtl8211e rx/tx delays config ... >>> The page operation callbacks are missing in the RTL8211E driver. >>> I just submitted a fix adding these callbacks to few Realtek PHY drivers >>> including RTl8211E. This should fix the issue. >> >> Hello Heiner, >> just tried your patch and indeed the NPE is gone. But still no network... >> The MAC <-> PHY link was working before, so, maybe the rgmii >> delays are not >> correctly configured. > > Hi Vicente > > What phy-mode do you have in device tree? Have you tried the others? > > rmgii > rmgii-id > rmgii-rxid > rmgii-txid > > Andrew Hi Andrew, it is configured as in the vanilla kernel: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/boot/dts/rockchip/rk3399-sapphire.dtsi#n191 ,that is, phy-mode = "rgmii"; There are also these configuration items: tx_delay = <0x28>; rx_delay = <0x11>; Instead of going the trial-and-error way, please, can you suggest a probably good configuration? Thanks, Vicenç.
> Hi Andrew, > it is configured as in the vanilla kernel: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/boot/dts/rockchip/rk3399-sapphire.dtsi#n191 > ,that is, > phy-mode = "rgmii"; > There are also these configuration items: > tx_delay = <0x28>; > rx_delay = <0x11>; > > Instead of going the trial-and-error way, please, can you suggest a > probably good configuration? I just found the same. Interestingly, the device tree binding says: Optional properties: - tx_delay: Delay value for TXD timing. Range value is 0~0x7F, 0x30 as default. - rx_delay: Delay value for RXD timing. Range value is 0~0x7F, 0x10 as default. So it is not using quite the default values. But there is no documentation about what these values mean. Given the difference of 0x20, it could be this is adding the needed TX delay, but not the RX delay? So you could try: rgmii-rxid And it is not clear what RX and TX mean, so also try rgmii-txid. in case they are swapped. Andrew
Hello Heiner and net-folks, On Sat, May 11, 2019 at 04:56:56PM +0200, Heiner Kallweit wrote: > On 11.05.2019 16:46, Vicente Bergas wrote: > > On Friday, May 10, 2019 10:28:06 PM CEST, Heiner Kallweit wrote: > >> On 10.05.2019 17:05, Vicente Bergas wrote: > >>> Hello, > >>> there is a regression on linux v5.1-9573-gb970afcfcabd with a kernel null > >>> pointer dereference. > >>> The issue is the commit f81dadbcf7fd067baf184b63c179fc392bdb226e > >>> net: phy: realtek: Add rtl8211e rx/tx delays config ... > >> The page operation callbacks are missing in the RTL8211E driver. > >> I just submitted a fix adding these callbacks to few Realtek PHY drivers > >> including RTl8211E. This should fix the issue. > > > > Hello Heiner, > > just tried your patch and indeed the NPE is gone. But still no network... > > The MAC <-> PHY link was working before, so, maybe the rgmii delays are not > > correctly configured. > > That's a question to the author of the original patch. My patch was just > meant to fix the NPE. In which configuration are you using the RTL8211E? > As a standalone PHY (with which MAC/driver?) or is it the integrated PHY > in a member of the RTL8168 family? > > Serge: The issue with the NPE gave a hint already that you didn't test your > patch. Was your patch based on an actual issue on some board and did you > test it? We may have to consider reverting the patch. > I'm sorry for the problems the patch caused. My fault I couldn't predict the paged-operations weren't defined for the E-revision of the PHY. Regarding the patch tests. As I mention in the patchset discussions, the patch was ported from earlier versions of the kernel. In particular I created it for kernels 4.4 and 4.9, where paged-operations weren't introduced. So when I moved it to the modern kernel I found the paged operations availability and decided to use them, which simplified the code providing a ready-to-use interface to access the PHY' extension pages. I also found they were defined in the driver with "rtl821x_" prefix and mistakenly decided, that they were also used for any rtl-like device. That's where I let the bug to creep in. Regarding the MAC-PHY link. Without this functionality our custom board of MAC and rtl8211e PHY doesn't provide a fully supported network, because the RXDLY and TXDLY pins are grounded so there is no a simple way to set the RGMII delays on the PHY side. Concerning the MAC-PHY link problem Vincente found I'll respond to the corresponding email in three hours. -Sergey > > With this change it is back to working: > > --- a/drivers/net/phy/realtek.c > > +++ b/drivers/net/phy/realtek.c > > @@ -300,7 +300,6 @@ > > }, { > > PHY_ID_MATCH_EXACT(0x001cc915), > > .name = "RTL8211E Gigabit Ethernet", > > - .config_init = &rtl8211e_config_init, > > .ack_interrupt = &rtl821x_ack_interrupt, > > .config_intr = &rtl8211e_config_intr, > > .suspend = genphy_suspend, > > That is basically reverting the patch. > > > > Regards, > > Vicenç. > > > >> Nevertheless your proposed patch looks good to me, just one small change > >> would be needed and it should be splitted. > >> > >> The change to phy-core I would consider a fix and it should be fine to > >> submit it to net (net-next is closed currently). > >> > >> Adding the warning to the Realtek driver is fine, but this would be > >> something for net-next once it's open again. > >> > >>> Regards, > >>> Vicenç. > >>> > >> Heiner > >> > >>> --- a/drivers/net/phy/phy-core.c > >>> +++ b/drivers/net/phy/phy-core.c > >>> @@ -648,11 +648,17 @@ > >>> > >>> static int __phy_read_page(struct phy_device *phydev) > >>> { ... > >> > >> Here phydev_warn() should be used. > >> > >>> + return 0; > >>> + } > >>> > >>> ret = phy_write(phydev, RTL821x_EXT_PAGE_SELECT, 0xa4); > >>> if (ret) > > > > >
Hello Vincente, On Sat, May 11, 2019 at 05:06:04PM +0200, Vicente Bergas wrote: > On Saturday, May 11, 2019 4:56:56 PM CEST, Heiner Kallweit wrote: > > On 11.05.2019 16:46, Vicente Bergas wrote: > > > On Friday, May 10, 2019 10:28:06 PM CEST, Heiner Kallweit wrote: > > > > On 10.05.2019 17:05, Vicente Bergas wrote: ... > > > > > > Hello Heiner, > > > just tried your patch and indeed the NPE is gone. But still no network... > > > The MAC <-> PHY link was working before, so, maybe the rgmii delays > > > are not > > > correctly configured. > > > > That's a question to the author of the original patch. My patch was just > > meant to fix the NPE. In which configuration are you using the RTL8211E? > > As a standalone PHY (with which MAC/driver?) or is it the integrated PHY > > in a member of the RTL8168 family? > > It is the one on the Sapphire board, so is connected to the MAC on the > RK3399 SoC. It is on page 8 of the schematics: > http://dl.vamrs.com/products/sapphire_excavator/RK_SAPPHIRE_SOCBOARD_RK3399_LPDDR3D178P232SD8_V12_20161109HXS.pdf > Thanks for sending this bug report. As I said in the commit-message. The idea of this patch is to provide a way to setup the RGMII delays in the PHY drivers (similar to the most of the PHY drivers). Before this commit phy-mode dts-node hadn't been taked into account by the PHY driver, so any PHY-delay setups provided via external pins strapping were accepted as is. But now rtl8211e phy-mode is parsed as follows: phy-mode="rgmii" - delays aren't set by PHY (current dts setting in rk3399-sapphire.dtsi) phy-mode="rgmii-id" - both RX and TX delays are setup on the PHY side, phy-mode="rgmii-rxid" - only RX delay is setup on the PHY side, phy-mode="rgmii-txid" - only TX delay is setup on the PHY side. It means, that now matter what the rtl8211e TXDLY/RXDLY pins are grounded or pulled high, the delays are going to be setup in accordance with the dts phy-mode settings, which is supposed to reflect the real hardware setup. So since you get the problem with MAC<->PHY link, it means your dts-file didn't provide a correct interface mode. Indeed seeing the sheet on page 7 in the sepphire pdf-file your rtl8211e PHY is setup to have TXDLY/RXDLY being pulled high, which means to add 2ns delays by the PHY. This setup corresponds to phy-mode="rgmii-id". As soon as you set it this way in the rk3399 dts-file, the MAC-PHY link shall work correctly as before. -Sergey > > Serge: The issue with the NPE gave a hint already that you didn't test your > > patch. Was your patch based on an actual issue on some board and did you > > test it? We may have to consider reverting the patch. > > > > > With this change it is back to working: > > > --- a/drivers/net/phy/realtek.c > > > +++ b/drivers/net/phy/realtek.c > > > @@ -300,7 +300,6 @@ > > > }, { > > > PHY_ID_MATCH_EXACT(0x001cc915), > > > .name = "RTL8211E Gigabit Ethernet", > > > - .config_init = &rtl8211e_config_init, > > > .ack_interrupt = &rtl821x_ack_interrupt, > > > .config_intr = &rtl8211e_config_intr, > > > .suspend = genphy_suspend, > > > That is basically reverting the patch. > > > > > > Regards, > > > Vicenç. > > > > > > > Nevertheless your proposed patch looks good to me, just one small change > > > > would be needed and it should be splitted. > > > > > > > > The change to phy-core I would consider a fix and it should be fine to > > > > submit it to net (net-next is closed currently). > > > > > > > > Adding the warning to the Realtek driver is fine, but this would be ... >
On Mon, May 13, 2019 at 01:29:42PM +0300, Serge Semin wrote: > Hello Vincente, > > On Sat, May 11, 2019 at 05:06:04PM +0200, Vicente Bergas wrote: > > On Saturday, May 11, 2019 4:56:56 PM CEST, Heiner Kallweit wrote: > > > On 11.05.2019 16:46, Vicente Bergas wrote: > > > > On Friday, May 10, 2019 10:28:06 PM CEST, Heiner Kallweit wrote: > > > > > On 10.05.2019 17:05, Vicente Bergas wrote: ... > > > > > > > > Hello Heiner, > > > > just tried your patch and indeed the NPE is gone. But still no network... > > > > The MAC <-> PHY link was working before, so, maybe the rgmii delays > > > > are not > > > > correctly configured. > > > > > > That's a question to the author of the original patch. My patch was just > > > meant to fix the NPE. In which configuration are you using the RTL8211E? > > > As a standalone PHY (with which MAC/driver?) or is it the integrated PHY > > > in a member of the RTL8168 family? > > > > It is the one on the Sapphire board, so is connected to the MAC on the > > RK3399 SoC. It is on page 8 of the schematics: > > http://dl.vamrs.com/products/sapphire_excavator/RK_SAPPHIRE_SOCBOARD_RK3399_LPDDR3D178P232SD8_V12_20161109HXS.pdf > > > > Thanks for sending this bug report. > > As I said in the commit-message. The idea of this patch is to provide a way > to setup the RGMII delays in the PHY drivers (similar to the most of the PHY > drivers). Before this commit phy-mode dts-node hadn't been taked into account > by the PHY driver, so any PHY-delay setups provided via external pins strapping > were accepted as is. But now rtl8211e phy-mode is parsed as follows: > phy-mode="rgmii" - delays aren't set by PHY (current dts setting in rk3399-sapphire.dtsi) > phy-mode="rgmii-id" - both RX and TX delays are setup on the PHY side, > phy-mode="rgmii-rxid" - only RX delay is setup on the PHY side, > phy-mode="rgmii-txid" - only TX delay is setup on the PHY side. > > It means, that now matter what the rtl8211e TXDLY/RXDLY pins are grounded or pulled > high, the delays are going to be setup in accordance with the dts phy-mode settings, > which is supposed to reflect the real hardware setup. > > So since you get the problem with MAC<->PHY link, it means your dts-file didn't provide a > correct interface mode. Indeed seeing the sheet on page 7 in the sepphire pdf-file your > rtl8211e PHY is setup to have TXDLY/RXDLY being pulled high, which means to add 2ns delays > by the PHY. This setup corresponds to phy-mode="rgmii-id". As soon as you set it this way > in the rk3399 dts-file, the MAC-PHY link shall work correctly as before. > > -Sergey Hmm, just figured out, that in the datasheet RXDLY/TXDLY pins are actually grounded, so phy-mode="rgmii" should work for you. Are you sure that on your actual hardware the resistors aren't placed differently? The current config register state can be read from the 0x1c extension page. Something like this: --- a/drivers/net/phy/realtek.c +++ b/drivers/net/phy/realtek.c @@ -221,6 +221,9 @@ static int rtl8211e_config_init(struct phy_device *phydev) if (ret) goto err_restore_page; + ret = phy_read(phydev, 0x1c); + dev_info(&phydev->mdio.dev, "PHY config register %08x\n", ret); + ret = phy_modify(phydev, 0x1c, RTL8211E_TX_DELAY | RTL8211E_RX_DELAY, val); -Sergey > > > > Serge: The issue with the NPE gave a hint already that you didn't test your > > > patch. Was your patch based on an actual issue on some board and did you > > > test it? We may have to consider reverting the patch. > > > > > > > With this change it is back to working: > > > > --- a/drivers/net/phy/realtek.c > > > > +++ b/drivers/net/phy/realtek.c > > > > @@ -300,7 +300,6 @@ > > > > }, { > > > > PHY_ID_MATCH_EXACT(0x001cc915), > > > > .name = "RTL8211E Gigabit Ethernet", > > > > - .config_init = &rtl8211e_config_init, > > > > .ack_interrupt = &rtl821x_ack_interrupt, > > > > .config_intr = &rtl8211e_config_intr, > > > > .suspend = genphy_suspend, > > > > That is basically reverting the patch. > > > > > > > > Regards, > > > > Vicenç. > > > > > > > > > Nevertheless your proposed patch looks good to me, just one small change > > > > > would be needed and it should be splitted. > > > > > > > > > > The change to phy-core I would consider a fix and it should be fine to > > > > > submit it to net (net-next is closed currently). > > > > > > > > > > Adding the warning to the Realtek driver is fine, but this would be ... > >
On Monday, May 13, 2019 12:51:05 PM CEST, Serge Semin wrote: > On Mon, May 13, 2019 at 01:29:42PM +0300, Serge Semin wrote: >> Hello Vincente, >> >> On Sat, May 11, 2019 at 05:06:04PM +0200, Vicente Bergas wrote: ... > > Hmm, just figured out, that in the datasheet RXDLY/TXDLY pins > are actually grounded, so > phy-mode="rgmii" should work for you. Are you sure that on your > actual hardware the > resistors aren't placed differently? That is correct, the schematic has pull-down resistors and placeholders for pull-up resistors. On the board I can confirm the pull-ups are not populated and the pull-downs are. But the issue seems unrelated to this. I have traced it down to a deadlock while trying to acquire a mutex. It hangs here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/phy/realtek.c?id=47782361aca2#n220 while waiting for this mutex: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/phy/mdio_bus.c?id=47782361aca2#n692 Regards, Vicenç. > The current config register state can be read from the 0x1c > extension page. Something > like this: > --- a/drivers/net/phy/realtek.c > +++ b/drivers/net/phy/realtek.c > @@ -221,6 +221,9 @@ static int rtl8211e_config_init(struct > phy_device *phydev) > if (ret) > goto err_restore_page; > > + ret = phy_read(phydev, 0x1c); > + dev_info(&phydev->mdio.dev, "PHY config register %08x\n", ret); > + > ret = phy_modify(phydev, 0x1c, RTL8211E_TX_DELAY | RTL8211E_RX_DELAY, > val); > > -Sergey
Hello, On Mon, May 13, 2019 at 02:19:17PM +0200, Vicente Bergas wrote: > On Monday, May 13, 2019 12:51:05 PM CEST, Serge Semin wrote: > > On Mon, May 13, 2019 at 01:29:42PM +0300, Serge Semin wrote: > > > Hello Vincente, > > > > > > On Sat, May 11, 2019 at 05:06:04PM +0200, Vicente Bergas wrote: ... > > > > Hmm, just figured out, that in the datasheet RXDLY/TXDLY pins are > > actually grounded, so > > phy-mode="rgmii" should work for you. Are you sure that on your actual > > hardware the > > resistors aren't placed differently? > > That is correct, the schematic has pull-down resistors and placeholders for > pull-up resistors. On the board I can confirm the pull-ups are not > populated and the pull-downs are. > But the issue seems unrelated to this. > > I have traced it down to a deadlock while trying to acquire a mutex. > It hangs here: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/phy/realtek.c?id=47782361aca2#n220 > while waiting for this mutex: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/phy/mdio_bus.c?id=47782361aca2#n692 > > Regards, > Vicenç. > Ahh, I see. Then using lock-less version of the access methods must fix the problem. You could try something like this: ------------------------------------------------------------------------------- diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c index 761ce3b1e7bd..14b61da1f32a 100644 --- a/drivers/net/phy/realtek.c +++ b/drivers/net/phy/realtek.c @@ -217,12 +217,12 @@ static int rtl8211e_config_init(struct phy_device *phydev) if (oldpage < 0) goto err_restore_page; - ret = phy_write(phydev, RTL821x_EXT_PAGE_SELECT, 0xa4); + ret = __phy_write(phydev, RTL821x_EXT_PAGE_SELECT, 0xa4); if (ret) goto err_restore_page; - ret = phy_modify(phydev, 0x1c, RTL8211E_TX_DELAY | RTL8211E_RX_DELAY, - val); + ret = __phy_modify(phydev, 0x1c, RTL8211E_TX_DELAY | RTL8211E_RX_DELAY, + val); err_restore_page: return phy_restore_page(phydev, oldpage, ret); ------------------------------------------------------------------------------- -Sergey > > The current config register state can be read from the 0x1c extension > > page. Something > > like this: > > --- a/drivers/net/phy/realtek.c > > +++ b/drivers/net/phy/realtek.c > > @@ -221,6 +221,9 @@ static int rtl8211e_config_init(struct phy_device > > *phydev) > > if (ret) > > goto err_restore_page; > > + ret = phy_read(phydev, 0x1c); > > + dev_info(&phydev->mdio.dev, "PHY config register %08x\n", ret); > > + > > ret = phy_modify(phydev, 0x1c, RTL8211E_TX_DELAY | RTL8211E_RX_DELAY, > > val); > > -Sergey >
> Ahh, I see. Then using lock-less version of the access methods must fix the > problem. You could try something like this: Kunihiko Hayash is way ahead of you. Andrew
On Mon, May 13, 2019 at 02:51:03PM +0200, Andrew Lunn wrote: > > Ahh, I see. Then using lock-less version of the access methods must fix the > > problem. You could try something like this: > > Kunihiko Hayash is way ahead of you. > > Andrew I wouldn't say that five hours is "way ahead". But if something fixes a bug in a patch it would be good to be have the original author being Cc'ed. Vincente, here is a link to the patch, that fixes the problem. https://lkml.org/lkml/2019/5/13/43 -Segey
On Monday, May 13, 2019 3:01:33 PM CEST, Serge Semin wrote: > On Mon, May 13, 2019 at 02:51:03PM +0200, Andrew Lunn wrote: >>> Ahh, I see. Then using lock-less version of the access >>> methods must fix the >>> problem. You could try something like this: >> >> Kunihiko Hayash is way ahead of you. >> >> Andrew > > I wouldn't say that five hours is "way ahead". But if something > fixes a bug in > a patch it would be good to be have the original author being Cc'ed. > > Vincente, here is a link to the patch, that fixes the problem. > https://lkml.org/lkml/2019/5/13/43 > > -Segey Tested and working OK now. Thanks for the link. Rgds.
--- a/drivers/net/phy/phy-core.c +++ b/drivers/net/phy/phy-core.c @@ -648,11 +648,17 @@ static int __phy_read_page(struct phy_device *phydev) { + if (!phydev->drv->read_page) + return -EOPNOTSUPP; + return phydev->drv->read_page(phydev); } static int __phy_write_page(struct phy_device *phydev, int page) { + if (!phydev->drv->write_page) + return -EOPNOTSUPP; + return phydev->drv->write_page(phydev, page); } --- a/drivers/net/phy/realtek.c +++ b/drivers/net/phy/realtek.c @@ -214,8 +214,10 @@ * for details). */ oldpage = phy_select_page(phydev, 0x7); - if (oldpage < 0) - goto err_restore_page; + if (oldpage < 0) { + dev_warn(&phydev->mdio.dev, "Unable to set rgmii delays\n"); + return 0; + } ret = phy_write(phydev, RTL821x_EXT_PAGE_SELECT, 0xa4); if (ret)