Message ID | 1477305654-11328-3-git-send-email-zefir.kurtisi@neratec.com |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
From: Zefir Kurtisi <zefir.kurtisi@neratec.com> Date: Mon, 24 Oct 2016 12:40:54 +0200 > In SGMII mode, we observed an autonegotiation issue > after power-down-up cycles where the copper side > reports successful link establishment but the > SGMII side's link is down. > > This happened in a setup where the at8031 is > connected over SGMII to a eTSEC (fsl gianfar), > but so far could not be reproduced with other > Ethernet device / driver combinations. > > This commit adds a wrapper function for at8031 > that in case of operating in SGMII mode double > checks SGMII link state when generic aneg_done() > succeeds. It prints a warning on failure but > intentionally does not try to recover from this > state. As a result, if you ever see a warning > '803x_aneg_done: SGMII link is not ok' you will > end up having an Ethernet link up but won't get > any data through. This should not happen, if it > does, please contact the module maintainer. > > Signed-off-by: Zefir Kurtisi <zefir.kurtisi@neratec.com> Applied.
Zefir Kurtisi wrote: > + /* check if the SGMII link is OK. */ > + if (!(phy_read(phydev, AT803X_PSSR) & AT803X_PSSR_MR_AN_COMPLETE)) { > + pr_warn("803x_aneg_done: SGMII link is not ok\n"); > + aneg_done = 0; I see this message appear sometimes when bring up the interface via ifup. However, contrary to your patch description, everything seems to work: $ iperf3 -c 192.168.1.1 -t 3600 Connecting to host 192.168.1.1, port 5201 [ 4] local 192.168.1.2 port 52640 connected to 192.168.1.1 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 109 MBytes 909 Mbits/sec 0 485 KBytes [ 4] 1.00-2.00 sec 108 MBytes 902 Mbits/sec 0 622 KBytes I wonder if you're impacted with all of the pause frame problems I'm having.
On 10/29/2016 12:24 AM, Timur Tabi wrote: > Zefir Kurtisi wrote: >> + /* check if the SGMII link is OK. */ >> + if (!(phy_read(phydev, AT803X_PSSR) & AT803X_PSSR_MR_AN_COMPLETE)) { >> + pr_warn("803x_aneg_done: SGMII link is not ok\n"); >> + aneg_done = 0; > > I see this message appear sometimes when bring up the interface via ifup. > However, contrary to your patch description, everything seems to work: > Right so, seeing the message and still having the SGMII link up and working is what happens most. But randomly (in our setup it is in the order of 1%) we see that SGMII remains down after a suspend-resume cycle. This message is a required but not sufficient condition. > $ iperf3 -c 192.168.1.1 -t 3600 > Connecting to host 192.168.1.1, port 5201 > [ 4] local 192.168.1.2 port 52640 connected to 192.168.1.1 port 5201 > [ ID] Interval Transfer Bandwidth Retr Cwnd > [ 4] 0.00-1.00 sec 109 MBytes 909 Mbits/sec 0 485 KBytes > [ 4] 1.00-2.00 sec 108 MBytes 902 Mbits/sec 0 622 KBytes > > I wonder if you're impacted with all of the pause frame problems I'm having. > I doubt, since resetting SGMII link alone (via toggling MII_BMCR on fibre page) brings back the SGMII communication, I assume it to be below the MAC layer. Are you aware of any related erratas for the at803x? I am not and therefore left alone with my guesswork.
On 10/24/2016 05:40 AM, Zefir Kurtisi wrote: > As a result, if you ever see a warning > '803x_aneg_done: SGMII link is not ok' you will > end up having an Ethernet link up but won't get > any data through. This should not happen, if it > does, please contact the module maintainer. I am now seeing this: ubuntu@ubuntu:~$ ifup eth1 ubuntu@ubuntu:~$ [ 588.687689] 803x_aneg_done: SGMII link is not ok [ 588.694909] qcom-emac QCOM8070:00 eth1: Link is Up - 1Gbps/Full - flow control rx/tx [ 588.703985] qcom-emac QCOM8070:00 eth1: Link is Up - 1Gbps/Full - flow control rx/tx ubuntu@ubuntu:~$ ping 192.168.3.1 PING 192.168.3.1 (192.168.3.1) 56(84) bytes of data. 64 bytes from 192.168.3.1: icmp_seq=1 ttl=64 time=0.502 ms 64 bytes from 192.168.3.1: icmp_seq=2 ttl=64 time=0.244 ms 64 bytes from 192.168.3.1: icmp_seq=3 ttl=64 time=0.220 ms ^C --- 192.168.3.1 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2107ms rtt min/avg/max/mdev = 0.220/0.322/0.502/0.127 ms So I do get the "SGMII link is not ok" message, but my connection is fine. I don't know why the link-up message is displayed twice. It's only displayed once if I use the genphy driver instead of the at803x driver. I'm going to debug the at803x to see what it does that causes the double link-up message.
On 01/18/2017 12:32 AM, Timur Tabi wrote: > On 10/24/2016 05:40 AM, Zefir Kurtisi wrote: >> As a result, if you ever see a warning >> '803x_aneg_done: SGMII link is not ok' you will >> end up having an Ethernet link up but won't get >> any data through. This should not happen, if it >> does, please contact the module maintainer. > > I am now seeing this: > > ubuntu@ubuntu:~$ ifup eth1 > ubuntu@ubuntu:~$ [ 588.687689] 803x_aneg_done: SGMII link is not ok > [ 588.694909] qcom-emac QCOM8070:00 eth1: Link is Up - 1Gbps/Full - flow control > rx/tx > [ 588.703985] qcom-emac QCOM8070:00 eth1: Link is Up - 1Gbps/Full - flow control > rx/tx > > ubuntu@ubuntu:~$ ping 192.168.3.1 > PING 192.168.3.1 (192.168.3.1) 56(84) bytes of data. > 64 bytes from 192.168.3.1: icmp_seq=1 ttl=64 time=0.502 ms > 64 bytes from 192.168.3.1: icmp_seq=2 ttl=64 time=0.244 ms > 64 bytes from 192.168.3.1: icmp_seq=3 ttl=64 time=0.220 ms > ^C > --- 192.168.3.1 ping statistics --- > 3 packets transmitted, 3 received, 0% packet loss, time 2107ms > rtt min/avg/max/mdev = 0.220/0.322/0.502/0.127 ms > > So I do get the "SGMII link is not ok" message, but my connection is fine. I > don't know why the link-up message is displayed twice. It's only displayed once > if I use the genphy driver instead of the at803x driver. > > I'm going to debug the at803x to see what it does that causes the double link-up > message. > The fact that you see the warning means external autoneg completes before the SGMII side in best case or SGMII link remains down in worst case. To prevent this, I am using a private variant of the at8031 driver that ensures that the SGMII autoneg is never restarted. If you ever end up with a dead link, feel free to test with the related functions below. Cheers, Zefir --- static int nt_at8031_no_soft_reset(struct phy_device *phydev) { return 0; } /* * Powering the chip down occasionally causes SGMII link loss, which in turn * causes the connection to gianfar to remain down. * * To prevent permanent link loss, instead of power down just isolate pins. */ static int nt_at8031_suspend(struct phy_device *phydev) { mutex_lock(&phydev->lock); phy_write(phydev, MII_BMCR, phy_read(phydev, MII_BMCR) | BMCR_ISOLATE); mutex_unlock(&phydev->lock); return 0; } static int nt_at8031_resume(struct phy_device *phydev) { mutex_lock(&phydev->lock); phy_write(phydev, MII_BMCR, phy_read(phydev, MII_BMCR) & ~BMCR_ISOLATE); mutex_unlock(&phydev->lock); return 0; }
Zefir Kurtisi wrote: > The fact that you see the warning means external autoneg completes before the > SGMII side in best case or SGMII link remains down in worst case. So I'm no expert on this. Are you saying that I might possibly be doing things backwards in my driver? That is, I should be configuring the SGMII side of the link before I start autonegotiation?
On 01/18/2017 02:13 PM, Timur Tabi wrote: > Zefir Kurtisi wrote: >> The fact that you see the warning means external autoneg completes before the >> SGMII side in best case or SGMII link remains down in worst case. > > So I'm no expert on this. Are you saying that I might possibly be doing things > backwards in my driver? That is, I should be configuring the SGMII side of the > link before I start autonegotiation? > No, not necessarily. The SGMII link default configuration is set such that you do not have to bother at all. That is, if in your case you see the warning popping up but the link always regains connection, then it is an ignorable false positive.
On 01/18/2017 07:53 AM, Zefir Kurtisi wrote: > No, not necessarily. The SGMII link default configuration is set such that you do > not have to bother at all. Does the SGMII link need to make the speed and duplex of the external link? https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/qualcomm/emac/emac-sgmii.c#n50 Here is where I can configure the SGMII link to match whatever phydev says the external link is. This is code that was handed down to me. I've never really understood what the purpose was. Shouldn't I just be able to configure the SGMII link to 1Gbs full-duplex, regardless of what the external link is set to? > That is, if in your case you see the warning popping up but the link always > regains connection, then it is an ignorable false positive. Unfortunately, I can't really ignore it because genphy does this: $ ifup eth1 [ 1034.754276] qcom-emac QCOM8070:00 eth1: Link is Up - 1Gbps/Full - flow control rx/tx But the at803x driver does this: $ ifup eth1 [ 1020.507728] 803x_aneg_done: SGMII link is not ok [ 1020.513517] qcom-emac QCOM8070:00 eth1: Link is Up - 1Gbps/Full - flow control rx/tx [ 1020.522839] qcom-emac QCOM8070:00 eth1: Link is Up - 1Gbps/Full - flow control rx/tx Customers are going to complain.
On 01/18/2017 04:02 PM, Timur Tabi wrote: > On 01/18/2017 07:53 AM, Zefir Kurtisi wrote: >> No, not necessarily. The SGMII link default configuration is set such that you do >> not have to bother at all. > > Does the SGMII link need to make the speed and duplex of the external link? > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/qualcomm/emac/emac-sgmii.c#n50 > > > Here is where I can configure the SGMII link to match whatever phydev says the > external link is. This is code that was handed down to me. I've never really > understood what the purpose was. Shouldn't I just be able to configure the SGMII > link to 1Gbs full-duplex, regardless of what the external link is set to? > This is because the SGMII link is a transparent interface to the upper layer with no means to inform the other side of the link type in the lower layer. It always operates at 675MHz, which with two lines gives 1.25Gbps, which at 10/8 coding gives exactly 1Gbps net data rate. If the at803x's copper side autonegotiates to 1Gbps, the bits traversing over the SGMII match the copper side 1:1. In case the copper side autonegotiates to e.g. 100Mbps, each bit at the copper side on the SGMII bus is replicated and sent 10x times - or 100x times in case of 10Mbps. The MAC side of the ETH needs to be aware of how the SGMII data has to be interpreted, and this is why you have to set the bits you are referring to. >> That is, if in your case you see the warning popping up but the link always >> regains connection, then it is an ignorable false positive. > > Unfortunately, I can't really ignore it because genphy does this: > > $ ifup eth1 > [ 1034.754276] qcom-emac QCOM8070:00 eth1: Link is Up - 1Gbps/Full - flow control > rx/tx > > But the at803x driver does this: > > $ ifup eth1 > [ 1020.507728] 803x_aneg_done: SGMII link is not ok > [ 1020.513517] qcom-emac QCOM8070:00 eth1: Link is Up - 1Gbps/Full - flow control > rx/tx > [ 1020.522839] qcom-emac QCOM8070:00 eth1: Link is Up - 1Gbps/Full - flow control > rx/tx > > Customers are going to complain. > Yes, the double link-status message is annoying - I was referring to the other message. To track down who is causing the additional message, I would proceed with following technique that helped me dig down a similar problem: since you control the events in question and there is no risk of flooding the kernel log, in the top of phy.c::phy_print_status() add a dump_stack() call. In the debug log ensure that all of the traces end up in the same phydev->adjust_link() callback handler (in your case emac_adjust_link()). In the gianfar's handler there is an additional check whether the link really changed before phy_print_status() is called, in emac_adjust_link() this happens unconditionally - maybe that is the problem you are facing. @Florian: is it safe to assume that when phydev->adjust_link() is called there was in fact a link change (link, duplex, pause), or does the handler have to double check for a change? I see some ETH drivers maintain a private old state instance for that, while others don't. Cheers, Zefir
On 01/19/2017 01:43 AM, Zefir Kurtisi wrote: > On 01/18/2017 04:02 PM, Timur Tabi wrote: >> On 01/18/2017 07:53 AM, Zefir Kurtisi wrote: >>> No, not necessarily. The SGMII link default configuration is set such that you do >>> not have to bother at all. >> >> Does the SGMII link need to make the speed and duplex of the external link? >> >> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/qualcomm/emac/emac-sgmii.c#n50 >> >> >> Here is where I can configure the SGMII link to match whatever phydev says the >> external link is. This is code that was handed down to me. I've never really >> understood what the purpose was. Shouldn't I just be able to configure the SGMII >> link to 1Gbs full-duplex, regardless of what the external link is set to? >> > > This is because the SGMII link is a transparent interface to the upper layer with > no means to inform the other side of the link type in the lower layer. > > It always operates at 675MHz, which with two lines gives 1.25Gbps, which at 10/8 > coding gives exactly 1Gbps net data rate. If the at803x's copper side > autonegotiates to 1Gbps, the bits traversing over the SGMII match the copper side > 1:1. In case the copper side autonegotiates to e.g. 100Mbps, each bit at the > copper side on the SGMII bus is replicated and sent 10x times - or 100x times in > case of 10Mbps. The MAC side of the ETH needs to be aware of how the SGMII data > has to be interpreted, and this is why you have to set the bits you are referring to. > >>> That is, if in your case you see the warning popping up but the link always >>> regains connection, then it is an ignorable false positive. >> >> Unfortunately, I can't really ignore it because genphy does this: >> >> $ ifup eth1 >> [ 1034.754276] qcom-emac QCOM8070:00 eth1: Link is Up - 1Gbps/Full - flow control >> rx/tx >> >> But the at803x driver does this: >> >> $ ifup eth1 >> [ 1020.507728] 803x_aneg_done: SGMII link is not ok >> [ 1020.513517] qcom-emac QCOM8070:00 eth1: Link is Up - 1Gbps/Full - flow control >> rx/tx >> [ 1020.522839] qcom-emac QCOM8070:00 eth1: Link is Up - 1Gbps/Full - flow control >> rx/tx >> >> Customers are going to complain. >> > Yes, the double link-status message is annoying - I was referring to the other > message. > > To track down who is causing the additional message, I would proceed with > following technique that helped me dig down a similar problem: since you control > the events in question and there is no risk of flooding the kernel log, in the top > of phy.c::phy_print_status() add a dump_stack() call. In the debug log ensure that > all of the traces end up in the same phydev->adjust_link() callback handler (in > your case emac_adjust_link()). > > In the gianfar's handler there is an additional check whether the link really > changed before phy_print_status() is called, in emac_adjust_link() this happens > unconditionally - maybe that is the problem you are facing. > > @Florian: is it safe to assume that when phydev->adjust_link() is called there was > in fact a link change (link, duplex, pause), or does the handler have to double > check for a change? I see some ETH drivers maintain a private old state instance > for that, while others don't. Yes, it is not just safe, it is a contract. The reason most drivers cache the values from one run on adjust_link to the other is because you don't want to needlessly read/write registers if nothing has changed, or just a subset of link parameters did change. NB: at some point I was considering bringing this caching into the core PHY library to save some housekeeping code in drivers...
Zefir Kurtisi wrote: > It always operates at 675MHz, which with two lines gives 1.25Gbps, > which at 10/8 coding gives exactly 1Gbps net data rate. If the > at803x's copper side autonegotiates to 1Gbps, the bits traversing > over the SGMII match the copper side 1:1. In case the copper side > autonegotiates to e.g. 100Mbps, each bit at the copper side on the > SGMII bus is replicated and sent 10x times - or 100x times in case of > 10Mbps. The MAC side of the ETH needs to be aware of how the SGMII > data has to be interpreted, and this is why you have to set the bits > you are referring to. So does this mean that the SGMII link should not be autonegotiated? I currently have this code: if (phydev->autoneg == AUTONEG_ENABLE) { val &= ~(FORCE_AN_RX_CFG | FORCE_AN_TX_CFG); val |= AN_ENABLE; writel(val, phy->base + EMAC_SGMII_PHY_AUTONEG_CFG2); } else { ... So if the external PHY is set to autonegotiate, then the SGMII block is set to also negotiate. Now that I think about it, this seems wrong. And in fact, I'm not sure how it works. It seems that the this only makes sense if the SGMII block is configured to act as the only PHY. This is an option that the hardware supports but my driver does not. So perhaps I should remove this part, and just do the rest: u32 speed_cfg; switch (phydev->speed) { case SPEED_10: speed_cfg = SPDMODE_10; break; case SPEED_100: speed_cfg = SPDMODE_100; break; case SPEED_1000: speed_cfg = SPDMODE_1000; break; default: return -EINVAL; } if (phydev->duplex == DUPLEX_FULL) speed_cfg |= DUPLEX_MODE; val &= ~AN_ENABLE; writel(speed_cfg, phy->base + EMAC_SGMII_PHY_SPEED_CFG1); writel(val, phy->base + EMAC_SGMII_PHY_AUTONEG_CFG2); Should I be doing this all the time? > To track down who is causing the additional message, I would proceed > with following technique that helped me dig down a similar problem: > since you control the events in question and there is no risk of > flooding the kernel log, in the top of phy.c::phy_print_status() add > a dump_stack() call. In the debug log ensure that all of the traces > end up in the same phydev->adjust_link() callback handler (in your > case emac_adjust_link()). That's a good idea, thanks.
On 01/20/2017 03:38 AM, Timur Tabi wrote: > Zefir Kurtisi wrote: >> It always operates at 675MHz, which with two lines gives 1.25Gbps, >> which at 10/8 coding gives exactly 1Gbps net data rate. If the >> at803x's copper side autonegotiates to 1Gbps, the bits traversing >> over the SGMII match the copper side 1:1. In case the copper side >> autonegotiates to e.g. 100Mbps, each bit at the copper side on the >> SGMII bus is replicated and sent 10x times - or 100x times in case of >> 10Mbps. The MAC side of the ETH needs to be aware of how the SGMII >> data has to be interpreted, and this is why you have to set the bits >> you are referring to. > > So does this mean that the SGMII link should not be autonegotiated? I currently > have this code: > > if (phydev->autoneg == AUTONEG_ENABLE) { > val &= ~(FORCE_AN_RX_CFG | FORCE_AN_TX_CFG); > val |= AN_ENABLE; > writel(val, phy->base + EMAC_SGMII_PHY_AUTONEG_CFG2); > } else { > ... > > So if the external PHY is set to autonegotiate, then the SGMII block is set to > also negotiate. Now that I think about it, this seems wrong. And in fact, I'm not > sure how it works. It seems that the this only makes sense if the SGMII block is > configured to act as the only PHY. This is an option that the hardware supports > but my driver does not. So perhaps I should remove this part, and just do the rest: > > > u32 speed_cfg; > > switch (phydev->speed) { > case SPEED_10: > speed_cfg = SPDMODE_10; > break; > case SPEED_100: > speed_cfg = SPDMODE_100; > break; > case SPEED_1000: > speed_cfg = SPDMODE_1000; > break; > default: > return -EINVAL; > } > > if (phydev->duplex == DUPLEX_FULL) > speed_cfg |= DUPLEX_MODE; > > val &= ~AN_ENABLE; > writel(speed_cfg, phy->base + EMAC_SGMII_PHY_SPEED_CFG1); > writel(val, phy->base + EMAC_SGMII_PHY_AUTONEG_CFG2); > > Should I be doing this all the time? > Hm, that might be product dependent. From my experience with the gianfar/at8031 this is what I can share: there is a known flaw in some revisions of the eTSEC that causes random failures in SGMII autonegotiation. As a workaround FSL proposes to use fixed speed. I spent lots of time getting this working, but in the end it turned out it does not. Setting fixed Gbps speed on both ends of the link is reported to be ok (link failure bits clean, page transmitted ok), but no data goes through the link. From a source I can't remember atm I was told that autonegotiation is mandatory for SGMII, therefore I believe there is no need to change the autoneg bits for SGMII in your case. I'd assume that resetting the link before setting the speed value should be enough, i.e. the second write to EMAC_SGMII_PHY_AUTONEG_CFG2 might be useless. YMMV >> To track down who is causing the additional message, I would proceed >> with following technique that helped me dig down a similar problem: >> since you control the events in question and there is no risk of >> flooding the kernel log, in the top of phy.c::phy_print_status() add >> a dump_stack() call. In the debug log ensure that all of the traces >> end up in the same phydev->adjust_link() callback handler (in your >> case emac_adjust_link()). > > That's a good idea, thanks. > As Florian responded, it seems you need to double check for real changes before printing the link status to get rid of the double printed notifications. Good Luck.
On 10/24/2016 05:40 AM, Zefir Kurtisi wrote: > This commit adds a wrapper function for at8031 > that in case of operating in SGMII mode double > checks SGMII link state when generic aneg_done() > succeeds. It prints a warning on failure but > intentionally does not try to recover from this > state. As a result, if you ever see a warning > '803x_aneg_done: SGMII link is not ok' you will > end up having an Ethernet link up but won't get > any data through. This should not happen, if it > does, please contact the module maintainer. I'm getting bitten by this one again. We're now have several systems that are reporting the link failure ("803x_aneg_done: SGMII link is not ok"), and the interface comes up but is not functional. I believe this is expected. The problem, however, is not because of the link failure. Instead, the problem is this: > + /* check if the SGMII link is OK. */ > + if (!(phy_read(phydev, AT803X_PSSR) & AT803X_PSSR_MR_AN_COMPLETE)) { > + pr_warn("803x_aneg_done: SGMII link is not ok\n"); > + aneg_done = 0; Returning zero is what breaks the interface. If I comment-out this last line, so that at803x_aneg_done() returns BMSR_ANEGCOMPLETE instead, then everything works. The documentation for phy_aneg_done() says this: * Description: Return the auto-negotiation status from this @phydev * Returns > 0 on success or < 0 on error. 0 means that auto-negotiation * is still pending. So I think there are two issues here: 1. What exactly is supposed to happen when phy_aneg_done() returns a zero? On our system, returning a zero results in a broken link, even though there are no errors reported. I just can't send any packets. 2. I'm preparing a patch that adds a command-line parameter to at803x that makes this code conditional. If you specify the parameter ("linkcheck") then it will check the link and return 0 on failure. Otherwise, it will return whether genphy_aneg_done() returns. The question is, should it still print the message? What I cannot determine is whether or not the link is actually okay. It appears to me that the driver says the link is not ok, but in truth it actually is, and maybe the whole at803x_aneg_done() function based on a false premise.
> 2. I'm preparing a patch that adds a command-line parameter to at803x that > makes this code conditional. FYI: A patch with a command line argument, i think you actually mean a module argument, is very likely to be rejected. Andrew
On Mon, May 22, 2017 at 03:12:03PM -0500, Timur Tabi wrote: > On 10/24/2016 05:40 AM, Zefir Kurtisi wrote: > > This commit adds a wrapper function for at8031 > > that in case of operating in SGMII mode double > > checks SGMII link state when generic aneg_done() > > succeeds. It prints a warning on failure but > > intentionally does not try to recover from this > > state. As a result, if you ever see a warning > > '803x_aneg_done: SGMII link is not ok' you will > > end up having an Ethernet link up but won't get > > any data through. This should not happen, if it > > does, please contact the module maintainer. > > I'm getting bitten by this one again. We're now have several systems that > are reporting the link failure ("803x_aneg_done: SGMII link is not ok"), and > the interface comes up but is not functional. I believe this is expected. > > The problem, however, is not because of the link failure. Instead, the > problem is this: > > > + /* check if the SGMII link is OK. */ > > + if (!(phy_read(phydev, AT803X_PSSR) & AT803X_PSSR_MR_AN_COMPLETE)) { > > + pr_warn("803x_aneg_done: SGMII link is not ok\n"); > > + aneg_done = 0; > > Returning zero is what breaks the interface. If I comment-out this last > line, so that at803x_aneg_done() returns BMSR_ANEGCOMPLETE instead, then > everything works. Are you using interrupts? Or polling? If polling, it should come back again 1 second later and see if auto-neg has completed. Hopefully the SGMII side comes up eventually. If you are using interrupts, you need another interrupt when the SGMII side comes up, otherwise i think the state machine is stuck waiting. Andrew
On 05/22/2017 02:02 PM, Andrew Lunn wrote: >> 2. I'm preparing a patch that adds a command-line parameter to at803x that >> makes this code conditional. > > FYI: > > A patch with a command line argument, i think you actually mean a > module argument, is very likely to be rejected. Even a module argument would be rejected. If you need platform/SoC specific behavior propagated down to the PHY driver, several options exist: - pass an agreed upon value for phy_flags to of_phy_connect() see drivers/net/ethernet/broadcom/tg3.c and drivers/net/ethernet/broadcom/genet/bcmgenet.c for instance and update the driver to act on that "flags" see drivers/net/phy/broadcom.c and drivers/net/phy/bcm7xxx.c - register a PHY fixup which is specific to the board/SoC, and have the PHY fixup do whatever is necessary for your platform (like setting specific registers) Preference goes for the first solution, but phy_flags is just a 32-bits integer, so you could run out of bits.
On 05/22/2017 04:10 PM, Florian Fainelli wrote: > Even a module argument would be rejected. If you need platform/SoC > specific behavior propagated down to the PHY driver, several options exist: > > - pass an agreed upon value for phy_flags to of_phy_connect() see > drivers/net/ethernet/broadcom/tg3.c and > drivers/net/ethernet/broadcom/genet/bcmgenet.c for instance and update > the driver to act on that "flags" see drivers/net/phy/broadcom.c and > drivers/net/phy/bcm7xxx.c Will this work on ACPI systems as well? I call phy_connect_direct() instead of of_phy_connect(). I see some drivers set phydev->dev_flags before calling phy_connect_direct(). My concern is that this problem occurs only on an at8031 chip, so having my network driver passing an at8031-specific flag seems out of place. What happens if, on some other board, a different PHY is used, and that flag means something else? > - register a PHY fixup which is specific to the board/SoC, and have the > PHY fixup do whatever is necessary for your platform (like setting > specific registers) Do you have an example of that?
On 05/22/2017 04:09 PM, Andrew Lunn wrote: > Are you using interrupts? Or polling? adpt->phydev->irq = PHY_IGNORE_INTERRUPT; ret = phy_connect_direct(netdev, adpt->phydev, emac_adjust_link, PHY_INTERFACE_MODE_SGMII); Technically it's polling, except that it's my NIC's hardware that is polling the MDIO bus, and then generating an interrupt when there's a link state change. When the link state changes, I call phy_mac_interrupt() if (status & ISR_GPHY_LINK) phy_mac_interrupt(adpt->phydev, !!(status & GPHY_LINK_UP_INT)); > If polling, it should come back again 1 second later and see if > auto-neg has completed. Hopefully the SGMII side comes up eventually. > > If you are using interrupts, you need another interrupt when the SGMII > side comes up, otherwise i think the state machine is stuck waiting. I'll have to test this, but what do I do if I don't get another interrupt? I have a suspicion that the link is actually okay, and that the error is bogus.
> I'll have to test this, but what do I do if I don't get another interrupt?
It probably means interrupts cannot be used. Poll it.
Andrew
On 05/22/2017 02:19 PM, Timur Tabi wrote: > On 05/22/2017 04:10 PM, Florian Fainelli wrote: >> Even a module argument would be rejected. If you need platform/SoC >> specific behavior propagated down to the PHY driver, several options exist: >> >> - pass an agreed upon value for phy_flags to of_phy_connect() see >> drivers/net/ethernet/broadcom/tg3.c and >> drivers/net/ethernet/broadcom/genet/bcmgenet.c for instance and update >> the driver to act on that "flags" see drivers/net/phy/broadcom.c and >> drivers/net/phy/bcm7xxx.c > > Will this work on ACPI systems as well? Provided you get a reference on the PHY dvice first, yes. > I call phy_connect_direct() instead > of of_phy_connect(). I see some drivers set phydev->dev_flags before > calling phy_connect_direct(). Setting phydev->dev_flags before calling phy_connect_direct() is okay and will work here. The key thing is that you will need to get a PHY device reference before, which would already be populated in your MDIO bus's mdio_map array, see e.g: mdiobus_get_phy(). You can also set phydev->dev_flags *after* calling phy_connect_direct(). The only reason why it should be done before, is to guarantee that phydrv::config_init would *see* the correct value there. If you need it at a later time, like in config_aneg() or aneg_done(), setting it later *might work*, but you'd have to trigger a auto-negotiation restart to avoid races between phy_connect_direct() returning, and phydrv::config_aneg() being called. > > My concern is that this problem occurs only on an at8031 chip, so having my > network driver passing an at8031-specific flag seems out of place. What > happens if, on some other board, a different PHY is used, and that flag > means something else? There needs to be an agreement on what the flags bits mean, and this needs to be in a shared header file that other network drivers can reference and where they can allocate their own bits if existing functionality is not covered. The allocation of such flags is centered around the perspective of the PHY driver. Of course, this only works if you have a strict mapping between the Ethernet MAC and the PHY, and if your MAC only uses the same PHY device driver. If that's not the case, you need to make sure you don't pass a phy_flags with bits set that could influence the behavior of another PHY driver that would also try to do something with these phy_flags. Fairly positive you can figure this out from your Ethernet MAC driver. > >> - register a PHY fixup which is specific to the board/SoC, and have the >> PHY fixup do whatever is necessary for your platform (like setting >> specific registers) > > Do you have an example of that? You could have grepped for phy_register_fixup() but the best example I can come up with is: drivers/net/usb/lan78xx.c and there are a lot more in arch/{arm,powerpc}/: arch/arm/mach-davinci/board-dm644x-evm.c: phy_register_fixup_for_uid(LXT971_PHY_ID, LXT971_PHY_MASK, arch/arm/mach-imx/mach-imx6q.c: phy_register_fixup_for_uid(PHY_ID_KSZ9021, MICREL_PHY_ID_MASK, arch/arm/mach-imx/mach-imx6q.c: phy_register_fixup_for_uid(PHY_ID_KSZ9031, MICREL_PHY_ID_MASK, arch/arm/mach-imx/mach-imx6q.c: phy_register_fixup_for_uid(PHY_ID_AR8031, 0xffffffef, arch/arm/mach-imx/mach-imx6q.c: phy_register_fixup_for_uid(PHY_ID_AR8035, 0xffffffef, arch/arm/mach-imx/mach-imx6sx.c: phy_register_fixup_for_uid(PHY_ID_AR8031, 0xffffffff, arch/arm/mach-imx/mach-imx6ul.c: phy_register_fixup_for_uid(PHY_ID_KSZ8081, MICREL_PHY_ID_MASK, arch/arm/mach-imx/mach-imx7d.c: phy_register_fixup_for_uid(PHY_ID_AR8031, 0xffffffff, arch/arm/mach-imx/mach-imx7d.c: phy_register_fixup_for_uid(PHY_ID_BCM54220, 0xffffffff, arch/arm/mach-mxs/mach-mxs.c: phy_register_fixup_for_uid(PHY_ID_KSZ8051, MICREL_PHY_ID_MASK, arch/arm/mach-orion5x/dns323-setup.c: phy_register_fixup_for_uid(MARVELL_PHY_ID_88E1118, arch/powerpc/platforms/85xx/mpc85xx_mds.c: phy_register_fixup_for_id(phy_id, mpc8568_fixup_125_clock); arch/powerpc/platforms/85xx/mpc85xx_mds.c: phy_register_fixup_for_id(phy_id, mpc8568_mds_phy_fixups); arch/powerpc/platforms/85xx/mpc85xx_mds.c: phy_register_fixup_for_id(phy_id, mpc8568_mds_phy_fixups);
On 05/22/2017 04:32 PM, Andrew Lunn wrote: >> I'll have to test this, but what do I do if I don't get another interrupt? > It probably means interrupts cannot be used. Poll it. I will test that to see what happens, but I believe the real problem is that the at803x driver is lying when it says that the link is not okay. I think the link is okay, and that's why I'm not getting any more interrupts. I don't think I should have to drop interrupt support in my MAC driver because one specific PHY driver is broken.
On Tue, May 23, 2017 at 10:54:57AM -0500, Timur Tabi wrote: > On 05/22/2017 04:32 PM, Andrew Lunn wrote: > >> I'll have to test this, but what do I do if I don't get another interrupt? > > It probably means interrupts cannot be used. Poll it. > > I will test that to see what happens, but I believe the real problem is that > the at803x driver is lying when it says that the link is not okay. I think > the link is okay, and that's why I'm not getting any more interrupts. I > don't think I should have to drop interrupt support in my MAC driver because > one specific PHY driver is broken. If it turns out the PHY hardware is broken, the phy driver itself can force it back to polling by setting phydev->irq to PHY_POLL in its probe() function. Andrew
On 05/23/2017 11:07 AM, Andrew Lunn wrote: >> > I will test that to see what happens, but I believe the real problem is that >> > the at803x driver is lying when it says that the link is not okay. I think >> > the link is okay, and that's why I'm not getting any more interrupts. I >> > don't think I should have to drop interrupt support in my MAC driver because >> > one specific PHY driver is broken. > If it turns out the PHY hardware is broken, the phy driver itself can > force it back to polling by setting phydev->irq to PHY_POLL in its > probe() function. I don't think the hardware is broken, I think the driver is broken. The patch that sets aneg_done to 0 should be reverted or restricted somehow. Even the developer of the patch admits that if the warning message is displayed, the link will appear to be up, but no packets will go through. Perhaps that's because the driver is returning 0 instead of BMSR_ANEGCOMPLETE? Would it be okay for the PHY driver to query a property from the device tree directly (e.g. "qca,check-sgmii-link"), and if present, only then implement the sgmii link check? So in at803x_probe(), I would do something like this: if (device_property_read_bool(&phydev->mdio.dev, "qca,check-sgmii-link") priv->check_sgmii_link = true;
On 23/05/17 18:33, Timur Tabi wrote: > On 05/23/2017 11:07 AM, Andrew Lunn wrote: >>>> I will test that to see what happens, but I believe the real problem is that >>>> the at803x driver is lying when it says that the link is not okay. I think >>>> the link is okay, and that's why I'm not getting any more interrupts. I >>>> don't think I should have to drop interrupt support in my MAC driver because >>>> one specific PHY driver is broken. >> If it turns out the PHY hardware is broken, the phy driver itself can >> force it back to polling by setting phydev->irq to PHY_POLL in its >> probe() function. > > I don't think the hardware is broken, I think the driver is broken. The > patch that sets aneg_done to 0 should be reverted or restricted somehow. > > Even the developer of the patch admits that if the warning message is > displayed, the link will appear to be up, but no packets will go through. > Perhaps that's because the driver is returning 0 instead of BMSR_ANEGCOMPLETE? > > Would it be okay for the PHY driver to query a property from the device tree > directly (e.g. "qca,check-sgmii-link"), and if present, only then implement > the sgmii link check? So in at803x_probe(), I would do something like this: > > if (device_property_read_bool(&phydev->mdio.dev, > "qca,check-sgmii-link") > priv->check_sgmii_link = true; > Zefir is currently on holiday and will probably get these emails when he gets back around 1. June I think you missunderstand what "the link appears up but no packets go through means". There are 2 pages which each reflect a side of the PHY: * copper side * SGMII side Until the patch only the copper side was ever considered when reporting the state of the link. This lead to the situation that the copper side was up (and the link reported up), but the SGMII side didn't come up. It could well be a bad combination of CPU and the AR8031/33. We know of at least 1 CPU (Freescale P1010) which has erratas for certain revisions regarding this. If the SGMII side doesn't come, well no packets can go through. With the patch: When the copper side is seen as up, it also checks if aneg of the SGMII link is done. As far as i know SGMII can not be run without aneg, since it is always Gbit with aneg mandatory. If SGMII aneg is not done, then, well aneg is not done and thus 0 is returned. Internally we have this patch extended so we don't only report that aneg is not done but also reset the link. Eventually aneg on the SGMII side can be completed and the link comes up. Why do you think that frames are able to go through when aneg is reported as not done by the PHY? Since aneg is mandatory for SGMII this can as well be seen as "link not up", not? BR Matthias
On 5/24/17 2:18 AM, Matthias May wrote: > With the patch: When the copper side is seen as up, it also checks if aneg of the SGMII link is done. > As far as i know SGMII can not be run without aneg, since it is always Gbit with aneg mandatory. > If SGMII aneg is not done, then, well aneg is not done and thus 0 is returned. > > Internally we have this patch extended so we don't only report that aneg is not done but also reset the link. > Eventually aneg on the SGMII side can be completed and the link comes up. I would really like to test this patch. > Why do you think that frames are able to go through when aneg is reported as not done by the PHY? I have two theories: 1. The warning message is bogus. The link actually is okay, but the driver thinks that it isn't. 2. The link is not okay, but it automatically fixes itself soon after the at803x_aneg_done() finishes. > Since aneg is mandatory for SGMII this can as well be seen as "link not up", not? The problem is that even though the link is up, the driver has returned "0", so the kernel thinks that autonegotiation has not finished. at803x_aneg_done() is never called again, and so I think the kernel is disabling the interface is some secret way.
> >Since aneg is mandatory for SGMII this can as well be seen as "link not up", not? > > The problem is that even though the link is up, the driver has > returned "0", so the kernel thinks that autonegotiation has not > finished. You need to prove this, that the link is not up. Any by link, we mean both the copper and the SGMII link. > at803x_aneg_done() is never called again, and so I think > the kernel is disabling the interface is some secret way. Well, the driver has told the core that the link is not up. So the kernel is waiting for another interrupt indicating the link has gone up. And probably, this second interrupt never happens. And it is not disabling the interface. Since the PHY is still down, the core has not called netif_carrier_on(). Andrew
On 5/24/17 8:40 AM, Andrew Lunn wrote: > You need to prove this, that the link is not up. Any by link, we mean > both the copper and the SGMII link. I can post the log of my iperf run showing that the, when at803x_aneg_done() returns zero, no packets can go through. And then after I change at803x_aneg_done() so that it returns BMSR_ANEGCOMPLETE, then packets do go through. Is that the proof you're looking for? The current work-around that we're using internally is to blacklist the at803x driver. This forces the kernel to use the genphy driver instead. Everything works when we do this. >> at803x_aneg_done() is never called again, and so I think >> the kernel is disabling the interface is some secret way. > > Well, the driver has told the core that the link is not up. So the > kernel is waiting for another interrupt indicating the link has gone > up. And probably, this second interrupt never happens. Exxactly. That's because the link IS up, and so there is no opportunity to receive another interrupt. > And it is not disabling the interface. Since the PHY is still down, > the core has not called netif_carrier_on(). Ok, I should have said "not enabled" instead of "disabled". Thanks.
On Wed, May 24, 2017 at 08:48:04AM -0500, Timur Tabi wrote: > On 5/24/17 8:40 AM, Andrew Lunn wrote: > > >You need to prove this, that the link is not up. Any by link, we mean > >both the copper and the SGMII link. > > I can post the log of my iperf run showing that the, when > at803x_aneg_done() returns zero, no packets can go through. And > then after I change at803x_aneg_done() so that it returns > BMSR_ANEGCOMPLETE, then packets do go through. Is that the proof > you're looking for? No. I would like to see the status of the copper side and the status of the SGMII side, at the point at803x_aneg_done() is called. If the copper side is up, but the SGMII side is down, returning 0 is correct. > Exxactly. That's because the link IS up, and so there is no > opportunity to receive another interrupt. It could be, the copper side is up, but the SGMII side is down, at the point at803x_aneg_done() is called. So it is correctly returning 0. Sometime later the SGMII side goes up, but there is not a second interrupt. Hence the phy core does not know that the full, 2 stage MAC to PHY to peer PHY link is now up. Andrew
On 05/24/2017 09:09 AM, Andrew Lunn wrote: > It could be, the copper side is up, but the SGMII side is down, at the > point at803x_aneg_done() is called. So it is correctly returning > 0. Sometime later the SGMII side goes up, but there is not a second > interrupt. Hence the phy core does not know that the full, 2 stage MAC > to PHY to peer PHY link is now up. Ok, I'm going to debug this some more. It turns out that the MAC side of the SGMII link can send an interrupt when it thinks that auto-negotiation is done. I might be able to use this. What function should my MAC driver call when it wants the phy core to call at803x_aneg_done again to see if autonegotiation is done? Also, is there a way for the MAC driver to know that at803x_aneg_done() previously returned 0, and that it needs to tell the phy core to check again?
On Wed, May 24, 2017 at 01:58:09PM -0500, Timur Tabi wrote: > On 05/24/2017 09:09 AM, Andrew Lunn wrote: > > It could be, the copper side is up, but the SGMII side is down, at the > > point at803x_aneg_done() is called. So it is correctly returning > > 0. Sometime later the SGMII side goes up, but there is not a second > > interrupt. Hence the phy core does not know that the full, 2 stage MAC > > to PHY to peer PHY link is now up. > > Ok, I'm going to debug this some more. It turns out that the MAC side of > the SGMII link can send an interrupt when it thinks that auto-negotiation is > done. I might be able to use this. You can use this for your board. But it still leaves the phy driver broken for everybody else. > What function should my MAC driver call when it wants the phy core to call > at803x_aneg_done again to see if autonegotiation is done? You want to trigger the PHY state machine. There is only one exported API call to do this, phy_mac_interrupt(). But you are supposed to pass the new link state. And your MAC driver has no idea of that, it does not know if the copper side of the PHY is up. So it might be better if you export phy_trigger_machine(). > Also, is there a way for the MAC driver to know that at803x_aneg_done() > previously returned 0, and that it needs to tell the phy core to check again? Not that i know of. The MAC layer is not supposed to be messing around in the PHY layer. However, just triggering the PHY state machine should be enough. Andrew
On 05/24/2017 02:34 PM, Andrew Lunn wrote: >> Ok, I'm going to debug this some more. It turns out that the MAC side of >> the SGMII link can send an interrupt when it thinks that auto-negotiation is >> done. I might be able to use this. > > You can use this for your board. But it still leaves the phy driver > broken for everybody else. Wait, I thought you said the at803x driver was not broken, since it returns 0 when the SGMII side of the link hasn't finished auto-negotiating? >> What function should my MAC driver call when it wants the phy core to call >> at803x_aneg_done again to see if autonegotiation is done? > > You want to trigger the PHY state machine. There is only one exported > API call to do this, phy_mac_interrupt(). But you are supposed to pass > the new link state. And your MAC driver has no idea of that, it does > not know if the copper side of the PHY is up. My NIC has a feature called autopolling where it takes over the MDIO bus and regularly polls the link state. When it detects that the link state has changed, it generates a MAC interrupt. This is when I call phy_mac_interrupt() normally. I think I can use the SGMII interrupt to also call phy_mac_interrupt(). The problem is the from the copper side, the link is already up, so if I call phy_mac_interrupt() again and say the link is up, the phy core is going to ignore that. > So it might be better if you export phy_trigger_machine(). I'll test that, but that does seem a bit hackish. >> Also, is there a way for the MAC driver to know that at803x_aneg_done() >> previously returned 0, and that it needs to tell the phy core to check again? > > Not that i know of. The MAC layer is not supposed to be messing around > in the PHY layer. However, just triggering the PHY state machine > should be enough. Can you tell my how PHY_HAS_INTERRUPT is supposed to work? How does the PHY send an interrupt? I'm starting to think that my NIC's autopolling feature is not compatible with phylib, and that I should use polling mode.
On Wed, May 24, 2017 at 03:57:06PM -0500, Timur Tabi wrote: > On 05/24/2017 02:34 PM, Andrew Lunn wrote: > >>Ok, I'm going to debug this some more. It turns out that the MAC side of > >>the SGMII link can send an interrupt when it thinks that auto-negotiation is > >>done. I might be able to use this. > > > >You can use this for your board. But it still leaves the phy driver > >broken for everybody else. > > Wait, I thought you said the at803x driver was not broken, since it > returns 0 when the SGMII side of the link hasn't finished > auto-negotiating? It is correct so far. But to work, it needs to interrupt again once the SGMII side has come up. Only then have we link. > My NIC has a feature called autopolling where it takes over the MDIO > bus and regularly polls the link state. When it detects that the > link state has changed, it generates a MAC interrupt. This is when > I call phy_mac_interrupt() normally. Unfortunately, you need to keep this feature turned off. It will not respect the phydev mutex. It has no idea what page has been currently selected. It probably has no way to flip the page and see if the SGMII link is up. etc. > Can you tell my how PHY_HAS_INTERRUPT is supposed to work? How does > the PHY send an interrupt? Generally, the PHY interrupt pin is connected to a GPIO. You then use the GPIO as an interrupt source. So it has an interrupt number. Put that in phydev->irq, eg using the interrupts property in device tree. The core will register an interrupt handler, and enable the interrupt. When it receives an interrupt, it calls the phy driver to service the interrupt. Andrew
On 05/24/2017 01:57 PM, Timur Tabi wrote: > On 05/24/2017 02:34 PM, Andrew Lunn wrote: >>> Ok, I'm going to debug this some more. It turns out that the MAC >>> side of >>> the SGMII link can send an interrupt when it thinks that >>> auto-negotiation is >>> done. I might be able to use this. >> >> You can use this for your board. But it still leaves the phy driver >> broken for everybody else. > > Wait, I thought you said the at803x driver was not broken, since it > returns 0 when the SGMII side of the link hasn't finished auto-negotiating? > >>> What function should my MAC driver call when it wants the phy core to >>> call >>> at803x_aneg_done again to see if autonegotiation is done? >> >> You want to trigger the PHY state machine. There is only one exported >> API call to do this, phy_mac_interrupt(). But you are supposed to pass >> the new link state. And your MAC driver has no idea of that, it does >> not know if the copper side of the PHY is up. > > My NIC has a feature called autopolling where it takes over the MDIO bus > and regularly polls the link state. When it detects that the link state > has changed, it generates a MAC interrupt. This is when I call > phy_mac_interrupt() normally. > > I think I can use the SGMII interrupt to also call phy_mac_interrupt(). > The problem is the from the copper side, the link is already up, so if I > call phy_mac_interrupt() again and say the link is up, the phy core is > going to ignore that. The question is what the HW is auto-polling on? Most HW implementations that I have seen either auto-poll and assume that BMSR.LINKSTATUS will reflect what they want to, or they even let you specify which register and bit(s) to monitor for link status. So which one is it here? As Andrew already responded, this clashes with any reads/writes that the PHY library could do, unless your HW is smart enough to serialize MDIO reads/writes? > >> So it might be better if you export phy_trigger_machine(). > > I'll test that, but that does seem a bit hackish. > >>> Also, is there a way for the MAC driver to know that at803x_aneg_done() >>> previously returned 0, and that it needs to tell the phy core to >>> check again? >> >> Not that i know of. The MAC layer is not supposed to be messing around >> in the PHY layer. However, just triggering the PHY state machine >> should be enough. > > Can you tell my how PHY_HAS_INTERRUPT is supposed to work? How does the > PHY send an interrupt? PHY_HAS_INTERRUPT indicates that you have a dedicated interrupt line for your PHY device and that you have the ability to implement a custom interrupt handling callback in the PHY driver (ack_interrupt) that lets you deal with all possible sorts of the interrupts that the PHY could generate. > > I'm starting to think that my NIC's autopolling feature is not > compatible with phylib, and that I should use polling mode. That's pretty much what Andrew told you about 5 emails ago...
On 05/24/2017 04:15 PM, Andrew Lunn wrote: >> My NIC has a feature called autopolling where it takes over the MDIO >> bus and regularly polls the link state. When it detects that the >> link state has changed, it generates a MAC interrupt. This is when >> I call phy_mac_interrupt() normally. > Unfortunately, you need to keep this feature turned off. It will not > respect the phydev mutex. It has no idea what page has been currently > selected. It probably has no way to flip the page and see if the SGMII > link is up. etc. phydev mutex? And what do you mean by page? I forgot one detail. Every time you do an MDIO read/write, it temporarily disables the feature. Although, I think that's not relevant to your point. Disabling this feature and switching from PHY_IGNORE_INTERRUPT to PHY_POLL might fix everything. I will try it.
On 05/24/2017 02:20 PM, Timur Tabi wrote: > On 05/24/2017 04:15 PM, Andrew Lunn wrote: >>> My NIC has a feature called autopolling where it takes over the MDIO >>> bus and regularly polls the link state. When it detects that the >>> link state has changed, it generates a MAC interrupt. This is when >>> I call phy_mac_interrupt() normally. > >> Unfortunately, you need to keep this feature turned off. It will not >> respect the phydev mutex. It has no idea what page has been currently >> selected. It probably has no way to flip the page and see if the SGMII >> link is up. etc. > > phydev mutex? And what do you mean by page? Yes phydev->lock which is used to serialize the state machine state changes. Most PHYs have many more registers than the 15 standard exposed directly, and so you need indirect reads/writes to access these registers, which typically involve switching a particular page, doing the indirect register access, and then flipping the page back. If you interrupt that scheme one way or another, your reads and writes are all messed up. > > I forgot one detail. Every time you do an MDIO read/write, it > temporarily disables the feature. Although, I think that's not relevant > to your point. Is that done by the HW itself, or is this under SW control exclusively. > > Disabling this feature and switching from PHY_IGNORE_INTERRUPT to > PHY_POLL might fix everything. I will try it. > Humm yes, that seems like a worthwhile exercise at least.
On 05/24/2017 04:28 PM, Florian Fainelli wrote: > Yes phydev->lock which is used to serialize the state machine state changes. > > Most PHYs have many more registers than the 15 standard exposed > directly, and so you need indirect reads/writes to access these > registers, which typically involve switching a particular page, doing > the indirect register access, and then flipping the page back. If you > interrupt that scheme one way or another, your reads and writes are all > messed up. Ah, and the at803x is a device like that. At worst, the autopoll feature could read a register from the wrong page, and think that the link state has changed when it hasn't. But that's still bad, and all my problems do revolve around link states. >> I forgot one detail. Every time you do an MDIO read/write, it >> temporarily disables the feature. Although, I think that's not relevant >> to your point. > > Is that done by the HW itself, or is this under SW control exclusively. Software.
On 05/24/2017 02:32 PM, Timur Tabi wrote: > On 05/24/2017 04:28 PM, Florian Fainelli wrote: > >> Yes phydev->lock which is used to serialize the state machine state >> changes. >> >> Most PHYs have many more registers than the 15 standard exposed >> directly, and so you need indirect reads/writes to access these >> registers, which typically involve switching a particular page, doing >> the indirect register access, and then flipping the page back. If you >> interrupt that scheme one way or another, your reads and writes are all >> messed up. > > Ah, and the at803x is a device like that. > > At worst, the autopoll feature could read a register from the wrong > page, and think that the link state has changed when it hasn't. But > that's still bad, and all my problems do revolve around link states. Absolutely, just like it's not clear whether autopoll does read BMSR.LINKSTAT or another at803x specific register? Also how is BMSR.LINKSTAT defined on at803x when there is SGMII + Copper involved? > >>> I forgot one detail. Every time you do an MDIO read/write, it >>> temporarily disables the feature. Although, I think that's not relevant >>> to your point. >> >> Is that done by the HW itself, or is this under SW control exclusively. > > Software. OK, and there is no way you can run into the following race condition: CPU HW MDIO read intent polling starts disable HW autopoll polling continues MDIO read is done MDIO read done polling stops MDIO read value returned if you disable autopolling in HW this is guaranteed to immediately stop by the time the register value is seen in HW and your I/O read/write returns?
On 05/24/2017 04:36 PM, Florian Fainelli wrote: > OK, and there is no way you can run into the following race condition: > > CPU HW > MDIO read intent > polling starts > disable HW autopoll > polling continues Disabling of the HW autopoll waits for the poll to actually stop before continuing. You can see the code here: http://elixir.free-electrons.com/linux/latest/source/drivers/net/ethernet/qualcomm/emac/emac-phy.c#L102 > MDIO read is done > MDIO read done > polling stops > MDIO read value returned > > > if you disable autopolling in HW this is guaranteed to immediately stop > by the time the register value is seen in HW and your I/O read/write > returns? It doesn't immediately stop, but the emac_phy_mdio_autopoll_disable() function waits for the MDIO bus to not be busy. But low-level details of this feature are not documented, so who knows what it does exactly? The original code that used this feature only supported one PHY and never expected there to be any asynchronous MDIO transactios.
Hello Timur, all, sorry for being late to chime in here, as Matthias said I have been AFK while this discussion took place. The related patch I committed is meant to double check that the SGMII side is done aneg when copper side is. Usually it does, but this expectation admittedly might fail in interrupt mode. What seems to happen is: after a power-down-up cycle the at803x restarts aneg at copper and SGMII sides. In polling mode, if you run into the case where copper is up but the SGMII is not, you'll see the message and in (one of) the next poll cycle(s) SGMII should be ready too and the situation recovers. In interrupt mode, obviously once you hit into the partially done aneg, you won't recover since no additional link-change interrupt is going to happen. Wit that, Tamur is right claiming the double-check breaks the driver in interrupt mode. I could think of several work-arounds to fix it, e.g. 1) check if there is an SGMII link-change interrupt source and enable it 2) when we run into partial aneg completion, temporary switch to polling mode and back to interrupt mode when at803x_aneg_done() succeeds Alas, we need to rewind back to whether these double-checks are required at all. As Matthias explained, there is at least one combination of devices that have documented issues establishing the SGMII link - in our case it was the at8031 attached to a P1010's eTSEC (errata A-004187, fixed with chip rev. 2.01). Chances are high that our company is the only one using this exact early version of problematic devices. Therefore, and since the problem is at ETH side, we moved our local workarounds into the gianfar driver who on demand restarts SGMII to recover from the stale link. I guess we need to decide whether we generally need to handle permanent aneg failures on the SGMII link. If we expect that it must not fail (like we assumed until we saw it failing), I agree with Timur and support reverting of the related commit f62265b53e. If otherwise we want this potential failure to be handled correctly, things become arbitrary complex. Essentially, we need to handle such PHYs as a combination of their two sides (copper + SGMII) as virtual sub-PHYs. The phylib might support that in a future version, but for now this seems like a lot of work required to handle a rare problem. tl;dr: ACK with reverting f62265b53e Cheers, Zefir On 05/24/2017 03:29 PM, Timur Tabi wrote: > On 5/24/17 2:18 AM, Matthias May wrote: >> With the patch: When the copper side is seen as up, it also checks if aneg of >> the SGMII link is done. >> As far as i know SGMII can not be run without aneg, since it is always Gbit with >> aneg mandatory. >> If SGMII aneg is not done, then, well aneg is not done and thus 0 is returned. >> >> Internally we have this patch extended so we don't only report that aneg is not >> done but also reset the link. >> Eventually aneg on the SGMII side can be completed and the link comes up. > > I would really like to test this patch. > >> Why do you think that frames are able to go through when aneg is reported as not >> done by the PHY? > > I have two theories: > > 1. The warning message is bogus. The link actually is okay, but the driver thinks > that it isn't. > > 2. The link is not okay, but it automatically fixes itself soon after the > at803x_aneg_done() finishes. > >> Since aneg is mandatory for SGMII this can as well be seen as "link not up", not? > > The problem is that even though the link is up, the driver has returned "0", so > the kernel thinks that autonegotiation has not finished. at803x_aneg_done() is > never called again, and so I think the kernel is disabling the interface is some > secret way. >
On 06/01/2017 06:45 AM, Zefir Kurtisi wrote: > I guess we need to decide whether we generally need to handle permanent aneg > failures on the SGMII link. If we expect that it must not fail (like we assumed > until we saw it failing), I agree with Timur and support reverting of the related > commit f62265b53e. If otherwise we want this potential failure to be handled > correctly, things become arbitrary complex. Essentially, we need to handle such > PHYs as a combination of their two sides (copper + SGMII) as virtual sub-PHYs. The > phylib might support that in a future version, but for now this seems like a lot > of work required to handle a rare problem. I'm about to post a patch that removes interrupt support from the EMAC driver and relies on software polling of the PHY. With this patch, we don't see the "link is not okay" message from that at803x driver any more. The link state is generally more reliable now, even when the at803x driver doesn't complain. My theory is that the hardware polling of the PHY is just too aggressive. I think it continuously reads the PHY status register at maximum speed and immediately issues an interrupt when the PHY says that it's up. So I think we're okay with leaving the at803x driver as-is, since we appear to be no longer getting any false failures.
diff --git a/drivers/net/phy/at803x.c b/drivers/net/phy/at803x.c index cf74d10..a52b560 100644 --- a/drivers/net/phy/at803x.c +++ b/drivers/net/phy/at803x.c @@ -42,10 +42,18 @@ #define AT803X_MMD_ACCESS_CONTROL 0x0D #define AT803X_MMD_ACCESS_CONTROL_DATA 0x0E #define AT803X_FUNC_DATA 0x4003 +#define AT803X_REG_CHIP_CONFIG 0x1f +#define AT803X_BT_BX_REG_SEL 0x8000 #define AT803X_DEBUG_ADDR 0x1D #define AT803X_DEBUG_DATA 0x1E +#define AT803X_MODE_CFG_MASK 0x0F +#define AT803X_MODE_CFG_SGMII 0x01 + +#define AT803X_PSSR 0x11 /*PHY-Specific Status Register*/ +#define AT803X_PSSR_MR_AN_COMPLETE 0x0200 + #define AT803X_DEBUG_REG_0 0x00 #define AT803X_DEBUG_RX_CLK_DLY_EN BIT(15) @@ -355,6 +363,36 @@ static void at803x_link_change_notify(struct phy_device *phydev) } } +static int at803x_aneg_done(struct phy_device *phydev) +{ + int ccr; + + int aneg_done = genphy_aneg_done(phydev); + if (aneg_done != BMSR_ANEGCOMPLETE) + return aneg_done; + + /* + * in SGMII mode, if copper side autoneg is successful, + * also check SGMII side autoneg result + */ + ccr = phy_read(phydev, AT803X_REG_CHIP_CONFIG); + if ((ccr & AT803X_MODE_CFG_MASK) != AT803X_MODE_CFG_SGMII) + return aneg_done; + + /* switch to SGMII/fiber page */ + phy_write(phydev, AT803X_REG_CHIP_CONFIG, ccr & ~AT803X_BT_BX_REG_SEL); + + /* check if the SGMII link is OK. */ + if (!(phy_read(phydev, AT803X_PSSR) & AT803X_PSSR_MR_AN_COMPLETE)) { + pr_warn("803x_aneg_done: SGMII link is not ok\n"); + aneg_done = 0; + } + /* switch back to copper page */ + phy_write(phydev, AT803X_REG_CHIP_CONFIG, ccr | AT803X_BT_BX_REG_SEL); + + return aneg_done; +} + static struct phy_driver at803x_driver[] = { { /* ATHEROS 8035 */ @@ -406,6 +444,7 @@ static struct phy_driver at803x_driver[] = { .flags = PHY_HAS_INTERRUPT, .config_aneg = genphy_config_aneg, .read_status = genphy_read_status, + .aneg_done = at803x_aneg_done, .ack_interrupt = &at803x_ack_interrupt, .config_intr = &at803x_config_intr, } };
In SGMII mode, we observed an autonegotiation issue after power-down-up cycles where the copper side reports successful link establishment but the SGMII side's link is down. This happened in a setup where the at8031 is connected over SGMII to a eTSEC (fsl gianfar), but so far could not be reproduced with other Ethernet device / driver combinations. This commit adds a wrapper function for at8031 that in case of operating in SGMII mode double checks SGMII link state when generic aneg_done() succeeds. It prints a warning on failure but intentionally does not try to recover from this state. As a result, if you ever see a warning '803x_aneg_done: SGMII link is not ok' you will end up having an Ethernet link up but won't get any data through. This should not happen, if it does, please contact the module maintainer. Signed-off-by: Zefir Kurtisi <zefir.kurtisi@neratec.com> --- drivers/net/phy/at803x.c | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+)