diff mbox

[2/2] at803x: double check SGMII side autoneg

Message ID 1477305654-11328-3-git-send-email-zefir.kurtisi@neratec.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Zefir Kurtisi Oct. 24, 2016, 10:40 a.m. UTC
In SGMII mode, we observed an autonegotiation issue
after power-down-up cycles where the copper side
reports successful link establishment but the
SGMII side's link is down.

This happened in a setup where the at8031 is
connected over SGMII to a eTSEC (fsl gianfar),
but so far could not be reproduced with other
Ethernet device / driver combinations.

This commit adds a wrapper function for at8031
that in case of operating in SGMII mode double
checks SGMII link state when generic aneg_done()
succeeds. It prints a warning on failure but
intentionally does not try to recover from this
state. As a result, if you ever see a warning
'803x_aneg_done: SGMII link is not ok' you will
end up having an Ethernet link up but won't get
any data through. This should not happen, if it
does, please contact the module maintainer.

Signed-off-by: Zefir Kurtisi <zefir.kurtisi@neratec.com>
---
 drivers/net/phy/at803x.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

Comments

David Miller Oct. 27, 2016, 8:05 p.m. UTC | #1
From: Zefir Kurtisi <zefir.kurtisi@neratec.com>
Date: Mon, 24 Oct 2016 12:40:54 +0200

> In SGMII mode, we observed an autonegotiation issue
> after power-down-up cycles where the copper side
> reports successful link establishment but the
> SGMII side's link is down.
> 
> This happened in a setup where the at8031 is
> connected over SGMII to a eTSEC (fsl gianfar),
> but so far could not be reproduced with other
> Ethernet device / driver combinations.
> 
> This commit adds a wrapper function for at8031
> that in case of operating in SGMII mode double
> checks SGMII link state when generic aneg_done()
> succeeds. It prints a warning on failure but
> intentionally does not try to recover from this
> state. As a result, if you ever see a warning
> '803x_aneg_done: SGMII link is not ok' you will
> end up having an Ethernet link up but won't get
> any data through. This should not happen, if it
> does, please contact the module maintainer.
> 
> Signed-off-by: Zefir Kurtisi <zefir.kurtisi@neratec.com>

Applied.
Timur Tabi Oct. 28, 2016, 10:24 p.m. UTC | #2
Zefir Kurtisi wrote:
> +	/* check if the SGMII link is OK. */
> +	if (!(phy_read(phydev, AT803X_PSSR) & AT803X_PSSR_MR_AN_COMPLETE)) {
> +		pr_warn("803x_aneg_done: SGMII link is not ok\n");
> +		aneg_done = 0;

I see this message appear sometimes when bring up the interface via 
ifup.  However, contrary to your patch description, everything seems to 
work:

$ iperf3 -c 192.168.1.1 -t 3600
Connecting to host 192.168.1.1, port 5201
[  4] local 192.168.1.2 port 52640 connected to 192.168.1.1 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   109 MBytes   909 Mbits/sec    0    485 KBytes
[  4]   1.00-2.00   sec   108 MBytes   902 Mbits/sec    0    622 KBytes

I wonder if you're impacted with all of the pause frame problems I'm having.
Zefir Kurtisi Nov. 1, 2016, 11:13 a.m. UTC | #3
On 10/29/2016 12:24 AM, Timur Tabi wrote:
> Zefir Kurtisi wrote:
>> +    /* check if the SGMII link is OK. */
>> +    if (!(phy_read(phydev, AT803X_PSSR) & AT803X_PSSR_MR_AN_COMPLETE)) {
>> +        pr_warn("803x_aneg_done: SGMII link is not ok\n");
>> +        aneg_done = 0;
> 
> I see this message appear sometimes when bring up the interface via ifup. 
> However, contrary to your patch description, everything seems to work:
> 
Right so, seeing the message and still having the SGMII link up and working is
what happens most. But randomly (in our setup it is in the order of 1%) we see
that SGMII remains down after a suspend-resume cycle. This message is a required
but not sufficient condition.

> $ iperf3 -c 192.168.1.1 -t 3600
> Connecting to host 192.168.1.1, port 5201
> [  4] local 192.168.1.2 port 52640 connected to 192.168.1.1 port 5201
> [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
> [  4]   0.00-1.00   sec   109 MBytes   909 Mbits/sec    0    485 KBytes
> [  4]   1.00-2.00   sec   108 MBytes   902 Mbits/sec    0    622 KBytes
> 
> I wonder if you're impacted with all of the pause frame problems I'm having.
> 

I doubt, since resetting SGMII link alone (via toggling MII_BMCR on fibre page)
brings back the SGMII communication, I assume it to be below the MAC layer.

Are you aware of any related erratas for the at803x? I am not and therefore left
alone with my guesswork.
Timur Tabi Jan. 17, 2017, 11:32 p.m. UTC | #4
On 10/24/2016 05:40 AM, Zefir Kurtisi wrote:
> As a result, if you ever see a warning
> '803x_aneg_done: SGMII link is not ok' you will
> end up having an Ethernet link up but won't get
> any data through. This should not happen, if it
> does, please contact the module maintainer.

I am now seeing this:

ubuntu@ubuntu:~$ ifup eth1
ubuntu@ubuntu:~$ [  588.687689] 803x_aneg_done: SGMII link is not ok
[  588.694909] qcom-emac QCOM8070:00 eth1: Link is Up - 1Gbps/Full - flow 
control rx/tx
[  588.703985] qcom-emac QCOM8070:00 eth1: Link is Up - 1Gbps/Full - flow 
control rx/tx

ubuntu@ubuntu:~$ ping 192.168.3.1
PING 192.168.3.1 (192.168.3.1) 56(84) bytes of data.
64 bytes from 192.168.3.1: icmp_seq=1 ttl=64 time=0.502 ms
64 bytes from 192.168.3.1: icmp_seq=2 ttl=64 time=0.244 ms
64 bytes from 192.168.3.1: icmp_seq=3 ttl=64 time=0.220 ms
^C
--- 192.168.3.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2107ms
rtt min/avg/max/mdev = 0.220/0.322/0.502/0.127 ms

So I do get the "SGMII link is not ok" message, but my connection is fine.  I 
don't know why the link-up message is displayed twice.  It's only displayed 
once if I use the genphy driver instead of the at803x driver.

I'm going to debug the at803x to see what it does that causes the double 
link-up message.
Zefir Kurtisi Jan. 18, 2017, 11 a.m. UTC | #5
On 01/18/2017 12:32 AM, Timur Tabi wrote:
> On 10/24/2016 05:40 AM, Zefir Kurtisi wrote:
>> As a result, if you ever see a warning
>> '803x_aneg_done: SGMII link is not ok' you will
>> end up having an Ethernet link up but won't get
>> any data through. This should not happen, if it
>> does, please contact the module maintainer.
> 
> I am now seeing this:
> 
> ubuntu@ubuntu:~$ ifup eth1
> ubuntu@ubuntu:~$ [  588.687689] 803x_aneg_done: SGMII link is not ok
> [  588.694909] qcom-emac QCOM8070:00 eth1: Link is Up - 1Gbps/Full - flow control
> rx/tx
> [  588.703985] qcom-emac QCOM8070:00 eth1: Link is Up - 1Gbps/Full - flow control
> rx/tx
> 
> ubuntu@ubuntu:~$ ping 192.168.3.1
> PING 192.168.3.1 (192.168.3.1) 56(84) bytes of data.
> 64 bytes from 192.168.3.1: icmp_seq=1 ttl=64 time=0.502 ms
> 64 bytes from 192.168.3.1: icmp_seq=2 ttl=64 time=0.244 ms
> 64 bytes from 192.168.3.1: icmp_seq=3 ttl=64 time=0.220 ms
> ^C
> --- 192.168.3.1 ping statistics ---
> 3 packets transmitted, 3 received, 0% packet loss, time 2107ms
> rtt min/avg/max/mdev = 0.220/0.322/0.502/0.127 ms
> 
> So I do get the "SGMII link is not ok" message, but my connection is fine.  I
> don't know why the link-up message is displayed twice.  It's only displayed once
> if I use the genphy driver instead of the at803x driver.
> 
> I'm going to debug the at803x to see what it does that causes the double link-up
> message.
> 

The fact that you see the warning means external autoneg completes before the
SGMII side in best case or SGMII link remains down in worst case. To prevent this,
I am using a private variant of the at8031 driver that ensures that the SGMII
autoneg is never restarted. If you ever end up with a dead link, feel free to test
with the related functions below.


Cheers,
Zefir

---
static int nt_at8031_no_soft_reset(struct phy_device *phydev)
{
	return 0;
}

/*
 * Powering the chip down occasionally causes SGMII link loss, which in turn
 * causes the connection to gianfar to remain down.
 *
 * To prevent permanent link loss, instead of power down just isolate pins.
 */
static int nt_at8031_suspend(struct phy_device *phydev)
{
	mutex_lock(&phydev->lock);
	phy_write(phydev, MII_BMCR, phy_read(phydev, MII_BMCR) | BMCR_ISOLATE);
	mutex_unlock(&phydev->lock);
	return 0;
}

static int nt_at8031_resume(struct phy_device *phydev)
{
	mutex_lock(&phydev->lock);
	phy_write(phydev, MII_BMCR, phy_read(phydev, MII_BMCR) & ~BMCR_ISOLATE);
	mutex_unlock(&phydev->lock);
	return 0;
}
Timur Tabi Jan. 18, 2017, 1:13 p.m. UTC | #6
Zefir Kurtisi wrote:
> The fact that you see the warning means external autoneg completes before the
> SGMII side in best case or SGMII link remains down in worst case.

So I'm no expert on this.  Are you saying that I might possibly be doing 
things backwards in my driver?  That is, I should be configuring the 
SGMII side of the link before I start autonegotiation?
Zefir Kurtisi Jan. 18, 2017, 1:53 p.m. UTC | #7
On 01/18/2017 02:13 PM, Timur Tabi wrote:
> Zefir Kurtisi wrote:
>> The fact that you see the warning means external autoneg completes before the
>> SGMII side in best case or SGMII link remains down in worst case.
> 
> So I'm no expert on this.  Are you saying that I might possibly be doing things
> backwards in my driver?  That is, I should be configuring the SGMII side of the
> link before I start autonegotiation?
> 
No, not necessarily. The SGMII link default configuration is set such that you do
not have to bother at all.

That is, if in your case you see the warning popping up but the link always
regains connection, then it is an ignorable false positive.
Timur Tabi Jan. 18, 2017, 3:02 p.m. UTC | #8
On 01/18/2017 07:53 AM, Zefir Kurtisi wrote:
> No, not necessarily. The SGMII link default configuration is set such that you do
> not have to bother at all.

Does the SGMII link need to make the speed and duplex of the external link?

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/qualcomm/emac/emac-sgmii.c#n50

Here is where I can configure the SGMII link to match whatever phydev says 
the external link is.  This is code that was handed down to me.  I've never 
really understood what the purpose was.  Shouldn't I just be able to 
configure the SGMII link to 1Gbs full-duplex, regardless of what the external 
link is set to?

> That is, if in your case you see the warning popping up but the link always
> regains connection, then it is an ignorable false positive.

Unfortunately, I can't really ignore it because genphy does this:

$ ifup eth1
[ 1034.754276] qcom-emac QCOM8070:00 eth1: Link is Up - 1Gbps/Full - flow 
control rx/tx

But the at803x driver does this:

$ ifup eth1
[ 1020.507728] 803x_aneg_done: SGMII link is not ok
[ 1020.513517] qcom-emac QCOM8070:00 eth1: Link is Up - 1Gbps/Full - flow 
control rx/tx
[ 1020.522839] qcom-emac QCOM8070:00 eth1: Link is Up - 1Gbps/Full - flow 
control rx/tx

Customers are going to complain.
Zefir Kurtisi Jan. 19, 2017, 9:43 a.m. UTC | #9
On 01/18/2017 04:02 PM, Timur Tabi wrote:
> On 01/18/2017 07:53 AM, Zefir Kurtisi wrote:
>> No, not necessarily. The SGMII link default configuration is set such that you do
>> not have to bother at all.
> 
> Does the SGMII link need to make the speed and duplex of the external link?
> 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/qualcomm/emac/emac-sgmii.c#n50
> 
> 
> Here is where I can configure the SGMII link to match whatever phydev says the
> external link is.  This is code that was handed down to me.  I've never really
> understood what the purpose was.  Shouldn't I just be able to configure the SGMII
> link to 1Gbs full-duplex, regardless of what the external link is set to?
> 

This is because the SGMII link is a transparent interface to the upper layer with
no means to inform the other side of the link type in the lower layer.

It always operates at 675MHz, which with two lines gives 1.25Gbps, which at 10/8
coding gives exactly 1Gbps net data rate. If the at803x's copper side
autonegotiates to 1Gbps, the bits traversing over the SGMII match the copper side
1:1. In case the copper side autonegotiates to e.g. 100Mbps, each bit at the
copper side on the SGMII bus is replicated and sent 10x times - or 100x times in
case of 10Mbps. The MAC side of the ETH needs to be aware of how the SGMII data
has to be interpreted, and this is why you have to set the bits you are referring to.

>> That is, if in your case you see the warning popping up but the link always
>> regains connection, then it is an ignorable false positive.
> 
> Unfortunately, I can't really ignore it because genphy does this:
> 
> $ ifup eth1
> [ 1034.754276] qcom-emac QCOM8070:00 eth1: Link is Up - 1Gbps/Full - flow control
> rx/tx
> 
> But the at803x driver does this:
> 
> $ ifup eth1
> [ 1020.507728] 803x_aneg_done: SGMII link is not ok
> [ 1020.513517] qcom-emac QCOM8070:00 eth1: Link is Up - 1Gbps/Full - flow control
> rx/tx
> [ 1020.522839] qcom-emac QCOM8070:00 eth1: Link is Up - 1Gbps/Full - flow control
> rx/tx
> 
> Customers are going to complain.
> 
Yes, the double link-status message is annoying - I was referring to the other
message.

To track down who is causing the additional message, I would proceed with
following technique that helped me dig down a similar problem: since you control
the events in question and there is no risk of flooding the kernel log, in the top
of phy.c::phy_print_status() add a dump_stack() call. In the debug log ensure that
all of the traces end up in the same phydev->adjust_link() callback handler (in
your case emac_adjust_link()).

In the gianfar's handler there is an additional check whether the link really
changed before phy_print_status() is called, in emac_adjust_link() this happens
unconditionally - maybe that is the problem you are facing.

@Florian: is it safe to assume that when phydev->adjust_link() is called there was
in fact a link change (link, duplex, pause), or does the handler have to double
check for a change? I see some ETH drivers maintain a private old state instance
for that, while others don't.


Cheers,
Zefir
Florian Fainelli Jan. 19, 2017, 6:01 p.m. UTC | #10
On 01/19/2017 01:43 AM, Zefir Kurtisi wrote:
> On 01/18/2017 04:02 PM, Timur Tabi wrote:
>> On 01/18/2017 07:53 AM, Zefir Kurtisi wrote:
>>> No, not necessarily. The SGMII link default configuration is set such that you do
>>> not have to bother at all.
>>
>> Does the SGMII link need to make the speed and duplex of the external link?
>>
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/qualcomm/emac/emac-sgmii.c#n50
>>
>>
>> Here is where I can configure the SGMII link to match whatever phydev says the
>> external link is.  This is code that was handed down to me.  I've never really
>> understood what the purpose was.  Shouldn't I just be able to configure the SGMII
>> link to 1Gbs full-duplex, regardless of what the external link is set to?
>>
> 
> This is because the SGMII link is a transparent interface to the upper layer with
> no means to inform the other side of the link type in the lower layer.
> 
> It always operates at 675MHz, which with two lines gives 1.25Gbps, which at 10/8
> coding gives exactly 1Gbps net data rate. If the at803x's copper side
> autonegotiates to 1Gbps, the bits traversing over the SGMII match the copper side
> 1:1. In case the copper side autonegotiates to e.g. 100Mbps, each bit at the
> copper side on the SGMII bus is replicated and sent 10x times - or 100x times in
> case of 10Mbps. The MAC side of the ETH needs to be aware of how the SGMII data
> has to be interpreted, and this is why you have to set the bits you are referring to.
> 
>>> That is, if in your case you see the warning popping up but the link always
>>> regains connection, then it is an ignorable false positive.
>>
>> Unfortunately, I can't really ignore it because genphy does this:
>>
>> $ ifup eth1
>> [ 1034.754276] qcom-emac QCOM8070:00 eth1: Link is Up - 1Gbps/Full - flow control
>> rx/tx
>>
>> But the at803x driver does this:
>>
>> $ ifup eth1
>> [ 1020.507728] 803x_aneg_done: SGMII link is not ok
>> [ 1020.513517] qcom-emac QCOM8070:00 eth1: Link is Up - 1Gbps/Full - flow control
>> rx/tx
>> [ 1020.522839] qcom-emac QCOM8070:00 eth1: Link is Up - 1Gbps/Full - flow control
>> rx/tx
>>
>> Customers are going to complain.
>>
> Yes, the double link-status message is annoying - I was referring to the other
> message.
> 
> To track down who is causing the additional message, I would proceed with
> following technique that helped me dig down a similar problem: since you control
> the events in question and there is no risk of flooding the kernel log, in the top
> of phy.c::phy_print_status() add a dump_stack() call. In the debug log ensure that
> all of the traces end up in the same phydev->adjust_link() callback handler (in
> your case emac_adjust_link()).
> 
> In the gianfar's handler there is an additional check whether the link really
> changed before phy_print_status() is called, in emac_adjust_link() this happens
> unconditionally - maybe that is the problem you are facing.
> 
> @Florian: is it safe to assume that when phydev->adjust_link() is called there was
> in fact a link change (link, duplex, pause), or does the handler have to double
> check for a change? I see some ETH drivers maintain a private old state instance
> for that, while others don't.

Yes, it is not just safe, it is a contract. The reason most drivers
cache the values from one run on adjust_link to the other is because you
don't want to needlessly read/write registers if nothing has changed, or
just a subset of link parameters did change.

NB: at some point I was considering bringing this caching into the core
PHY library to save some housekeeping code in drivers...
Timur Tabi Jan. 20, 2017, 2:38 a.m. UTC | #11
Zefir Kurtisi wrote:
> It always operates at 675MHz, which with two lines gives 1.25Gbps,
> which at 10/8 coding gives exactly 1Gbps net data rate. If the
> at803x's copper side autonegotiates to 1Gbps, the bits traversing
> over the SGMII match the copper side 1:1. In case the copper side
> autonegotiates to e.g. 100Mbps, each bit at the copper side on the
> SGMII bus is replicated and sent 10x times - or 100x times in case of
> 10Mbps. The MAC side of the ETH needs to be aware of how the SGMII
> data has to be interpreted, and this is why you have to set the bits
> you are referring to.

So does this mean that the SGMII link should not be autonegotiated? I 
currently have this code:

     if (phydev->autoneg == AUTONEG_ENABLE) {
         val &= ~(FORCE_AN_RX_CFG | FORCE_AN_TX_CFG);
         val |= AN_ENABLE;
         writel(val, phy->base + EMAC_SGMII_PHY_AUTONEG_CFG2);
     } else {
         ...

So if the external PHY is set to autonegotiate, then the SGMII block is 
set to also negotiate.  Now that I think about it, this seems wrong. 
And in fact, I'm not sure how it works.  It seems that the this only 
makes sense if the SGMII block is configured to act as the only PHY. 
This is an option that the hardware supports but my driver does not.  So 
perhaps I should remove this part, and just do the rest:


	u32 speed_cfg;

	switch (phydev->speed) {
	case SPEED_10:
		speed_cfg = SPDMODE_10;
		break;
	case SPEED_100:
		speed_cfg = SPDMODE_100;
		break;
	case SPEED_1000:
		speed_cfg = SPDMODE_1000;
		break;
	default:
		return -EINVAL;
	}

	if (phydev->duplex == DUPLEX_FULL)
		speed_cfg |= DUPLEX_MODE;

	val &= ~AN_ENABLE;
	writel(speed_cfg, phy->base + EMAC_SGMII_PHY_SPEED_CFG1);
	writel(val, phy->base + EMAC_SGMII_PHY_AUTONEG_CFG2);

Should I be doing this all the time?

> To track down who is causing the additional message, I would proceed
> with following technique that helped me dig down a similar problem:
> since you control the events in question and there is no risk of
> flooding the kernel log, in the top of phy.c::phy_print_status() add
> a dump_stack() call. In the debug log ensure that all of the traces
> end up in the same phydev->adjust_link() callback handler (in your
> case emac_adjust_link()).

That's a good idea, thanks.
Zefir Kurtisi Jan. 20, 2017, 3:31 p.m. UTC | #12
On 01/20/2017 03:38 AM, Timur Tabi wrote:
> Zefir Kurtisi wrote:
>> It always operates at 675MHz, which with two lines gives 1.25Gbps,
>> which at 10/8 coding gives exactly 1Gbps net data rate. If the
>> at803x's copper side autonegotiates to 1Gbps, the bits traversing
>> over the SGMII match the copper side 1:1. In case the copper side
>> autonegotiates to e.g. 100Mbps, each bit at the copper side on the
>> SGMII bus is replicated and sent 10x times - or 100x times in case of
>> 10Mbps. The MAC side of the ETH needs to be aware of how the SGMII
>> data has to be interpreted, and this is why you have to set the bits
>> you are referring to.
> 
> So does this mean that the SGMII link should not be autonegotiated? I currently
> have this code:
> 
>     if (phydev->autoneg == AUTONEG_ENABLE) {
>         val &= ~(FORCE_AN_RX_CFG | FORCE_AN_TX_CFG);
>         val |= AN_ENABLE;
>         writel(val, phy->base + EMAC_SGMII_PHY_AUTONEG_CFG2);
>     } else {
>         ...
> 
> So if the external PHY is set to autonegotiate, then the SGMII block is set to
> also negotiate.  Now that I think about it, this seems wrong. And in fact, I'm not
> sure how it works.  It seems that the this only makes sense if the SGMII block is
> configured to act as the only PHY. This is an option that the hardware supports
> but my driver does not.  So perhaps I should remove this part, and just do the rest:
> 
> 
>     u32 speed_cfg;
> 
>     switch (phydev->speed) {
>     case SPEED_10:
>         speed_cfg = SPDMODE_10;
>         break;
>     case SPEED_100:
>         speed_cfg = SPDMODE_100;
>         break;
>     case SPEED_1000:
>         speed_cfg = SPDMODE_1000;
>         break;
>     default:
>         return -EINVAL;
>     }
> 
>     if (phydev->duplex == DUPLEX_FULL)
>         speed_cfg |= DUPLEX_MODE;
> 
>     val &= ~AN_ENABLE;
>     writel(speed_cfg, phy->base + EMAC_SGMII_PHY_SPEED_CFG1);
>     writel(val, phy->base + EMAC_SGMII_PHY_AUTONEG_CFG2);
> 
> Should I be doing this all the time?
> 
Hm, that might be product dependent. From my experience with the gianfar/at8031
this is what I can share: there is a known flaw in some revisions of the eTSEC
that causes random failures in SGMII autonegotiation. As a workaround FSL proposes
to use fixed speed. I spent lots of time getting this working, but in the end it
turned out it does not. Setting fixed Gbps speed on both ends of the link is
reported to be ok (link failure bits clean, page transmitted ok), but no data goes
through the link.

From a source I can't remember atm I was told that autonegotiation is mandatory
for SGMII, therefore I believe there is no need to change the autoneg bits for
SGMII in your case. I'd assume that resetting the link before setting the speed
value should be enough, i.e. the second write to EMAC_SGMII_PHY_AUTONEG_CFG2 might
be useless. YMMV

>> To track down who is causing the additional message, I would proceed
>> with following technique that helped me dig down a similar problem:
>> since you control the events in question and there is no risk of
>> flooding the kernel log, in the top of phy.c::phy_print_status() add
>> a dump_stack() call. In the debug log ensure that all of the traces
>> end up in the same phydev->adjust_link() callback handler (in your
>> case emac_adjust_link()).
> 
> That's a good idea, thanks.
> 
As Florian responded, it seems you need to double check for real changes before
printing the link status to get rid of the double printed notifications.

Good Luck.
Timur Tabi May 22, 2017, 8:12 p.m. UTC | #13
On 10/24/2016 05:40 AM, Zefir Kurtisi wrote:
> This commit adds a wrapper function for at8031
> that in case of operating in SGMII mode double
> checks SGMII link state when generic aneg_done()
> succeeds. It prints a warning on failure but
> intentionally does not try to recover from this
> state. As a result, if you ever see a warning
> '803x_aneg_done: SGMII link is not ok' you will
> end up having an Ethernet link up but won't get
> any data through. This should not happen, if it
> does, please contact the module maintainer.

I'm getting bitten by this one again.  We're now have several systems that
are reporting the link failure ("803x_aneg_done: SGMII link is not ok"), and
the interface comes up but is not functional.  I believe this is expected.

The problem, however, is not because of the link failure.  Instead, the
problem is this:

> +	/* check if the SGMII link is OK. */
> +	if (!(phy_read(phydev, AT803X_PSSR) & AT803X_PSSR_MR_AN_COMPLETE)) {
> +		pr_warn("803x_aneg_done: SGMII link is not ok\n");
> +		aneg_done = 0;

Returning zero is what breaks the interface.  If I comment-out this last
line, so that at803x_aneg_done() returns BMSR_ANEGCOMPLETE instead, then
everything works.

The documentation for phy_aneg_done() says this:

 * Description: Return the auto-negotiation status from this @phydev
 * Returns > 0 on success or < 0 on error. 0 means that auto-negotiation
 * is still pending.

So I think there are two issues here:

1. What exactly is supposed to happen when phy_aneg_done() returns a zero?
On our system, returning a zero results in a broken link, even though there
are no errors reported.  I just can't send any packets.

2. I'm preparing a patch that adds a command-line parameter to at803x that
makes this code conditional.  If you specify the parameter ("linkcheck")
then it will check the link and return 0 on failure.  Otherwise, it will
return whether genphy_aneg_done() returns.  The question is, should it still
print the message?

What I cannot determine is whether or not the link is actually okay.  It
appears to me that the driver says the link is not ok, but in truth it
actually is, and maybe the whole at803x_aneg_done() function based on a
false premise.
Andrew Lunn May 22, 2017, 9:02 p.m. UTC | #14
> 2. I'm preparing a patch that adds a command-line parameter to at803x that
> makes this code conditional.

FYI:

A patch with a command line argument, i think you actually mean a
module argument, is very likely to be rejected.

  Andrew
Andrew Lunn May 22, 2017, 9:09 p.m. UTC | #15
On Mon, May 22, 2017 at 03:12:03PM -0500, Timur Tabi wrote:
> On 10/24/2016 05:40 AM, Zefir Kurtisi wrote:
> > This commit adds a wrapper function for at8031
> > that in case of operating in SGMII mode double
> > checks SGMII link state when generic aneg_done()
> > succeeds. It prints a warning on failure but
> > intentionally does not try to recover from this
> > state. As a result, if you ever see a warning
> > '803x_aneg_done: SGMII link is not ok' you will
> > end up having an Ethernet link up but won't get
> > any data through. This should not happen, if it
> > does, please contact the module maintainer.
> 
> I'm getting bitten by this one again.  We're now have several systems that
> are reporting the link failure ("803x_aneg_done: SGMII link is not ok"), and
> the interface comes up but is not functional.  I believe this is expected.
> 
> The problem, however, is not because of the link failure.  Instead, the
> problem is this:
> 
> > +	/* check if the SGMII link is OK. */
> > +	if (!(phy_read(phydev, AT803X_PSSR) & AT803X_PSSR_MR_AN_COMPLETE)) {
> > +		pr_warn("803x_aneg_done: SGMII link is not ok\n");
> > +		aneg_done = 0;
> 
> Returning zero is what breaks the interface.  If I comment-out this last
> line, so that at803x_aneg_done() returns BMSR_ANEGCOMPLETE instead, then
> everything works.

Are you using interrupts? Or polling?

If polling, it should come back again 1 second later and see if
auto-neg has completed. Hopefully the SGMII side comes up eventually.

If you are using interrupts, you need another interrupt when the SGMII
side comes up, otherwise i think the state machine is stuck waiting.

     Andrew
Florian Fainelli May 22, 2017, 9:10 p.m. UTC | #16
On 05/22/2017 02:02 PM, Andrew Lunn wrote:
>> 2. I'm preparing a patch that adds a command-line parameter to at803x that
>> makes this code conditional.
> 
> FYI:
> 
> A patch with a command line argument, i think you actually mean a
> module argument, is very likely to be rejected.

Even a module argument would be rejected. If you need platform/SoC
specific behavior propagated down to the PHY driver, several options exist:

- pass an agreed upon value for phy_flags to of_phy_connect() see
drivers/net/ethernet/broadcom/tg3.c and
drivers/net/ethernet/broadcom/genet/bcmgenet.c for instance and update
the driver to act on that "flags" see drivers/net/phy/broadcom.c and
drivers/net/phy/bcm7xxx.c

- register a PHY fixup which is specific to the board/SoC, and have the
PHY fixup do whatever is necessary for your platform (like setting
specific registers)

Preference goes for the first solution, but phy_flags is just a 32-bits
integer, so you could run out of bits.
Timur Tabi May 22, 2017, 9:19 p.m. UTC | #17
On 05/22/2017 04:10 PM, Florian Fainelli wrote:
> Even a module argument would be rejected. If you need platform/SoC
> specific behavior propagated down to the PHY driver, several options exist:
> 
> - pass an agreed upon value for phy_flags to of_phy_connect() see
> drivers/net/ethernet/broadcom/tg3.c and
> drivers/net/ethernet/broadcom/genet/bcmgenet.c for instance and update
> the driver to act on that "flags" see drivers/net/phy/broadcom.c and
> drivers/net/phy/bcm7xxx.c

Will this work on ACPI systems as well?  I call phy_connect_direct() instead
of of_phy_connect().  I see some drivers set phydev->dev_flags before
calling phy_connect_direct().

My concern is that this problem occurs only on an at8031 chip, so having my
network driver passing an at8031-specific flag seems out of place.  What
happens if, on some other board, a different PHY is used, and that flag
means something else?

> - register a PHY fixup which is specific to the board/SoC, and have the
> PHY fixup do whatever is necessary for your platform (like setting
> specific registers)

Do you have an example of that?
Timur Tabi May 22, 2017, 9:29 p.m. UTC | #18
On 05/22/2017 04:09 PM, Andrew Lunn wrote:
> Are you using interrupts? Or polling?

adpt->phydev->irq = PHY_IGNORE_INTERRUPT;
ret = phy_connect_direct(netdev, adpt->phydev, emac_adjust_link,
			 PHY_INTERFACE_MODE_SGMII);

Technically it's polling, except that it's my NIC's hardware that is polling
the MDIO bus, and then generating an interrupt when there's a link state change.

When the link state changes, I call phy_mac_interrupt()

if (status & ISR_GPHY_LINK)
	phy_mac_interrupt(adpt->phydev, !!(status & GPHY_LINK_UP_INT));

> If polling, it should come back again 1 second later and see if
> auto-neg has completed. Hopefully the SGMII side comes up eventually.
> 
> If you are using interrupts, you need another interrupt when the SGMII
> side comes up, otherwise i think the state machine is stuck waiting.

I'll have to test this, but what do I do if I don't get another interrupt?
I have a suspicion that the link is actually okay, and that the error is bogus.
Andrew Lunn May 22, 2017, 9:32 p.m. UTC | #19
> I'll have to test this, but what do I do if I don't get another interrupt?

It probably means interrupts cannot be used. Poll it.

	   Andrew
Florian Fainelli May 22, 2017, 9:50 p.m. UTC | #20
On 05/22/2017 02:19 PM, Timur Tabi wrote:
> On 05/22/2017 04:10 PM, Florian Fainelli wrote:
>> Even a module argument would be rejected. If you need platform/SoC
>> specific behavior propagated down to the PHY driver, several options exist:
>>
>> - pass an agreed upon value for phy_flags to of_phy_connect() see
>> drivers/net/ethernet/broadcom/tg3.c and
>> drivers/net/ethernet/broadcom/genet/bcmgenet.c for instance and update
>> the driver to act on that "flags" see drivers/net/phy/broadcom.c and
>> drivers/net/phy/bcm7xxx.c
> 
> Will this work on ACPI systems as well?

Provided you get a reference on the PHY dvice first, yes.

>  I call phy_connect_direct() instead
> of of_phy_connect().  I see some drivers set phydev->dev_flags before
> calling phy_connect_direct().

Setting phydev->dev_flags before calling phy_connect_direct() is okay
and will work here. The key thing is that you will need to get a PHY
device reference before, which would already be populated in your MDIO
bus's mdio_map array, see e.g: mdiobus_get_phy().

You can also set phydev->dev_flags *after* calling phy_connect_direct().
The only reason why it should be done before, is to guarantee that
phydrv::config_init would *see* the correct value there. If you need it
at a later time, like in config_aneg() or aneg_done(), setting it later
*might work*, but you'd have to trigger a auto-negotiation restart to
avoid races between phy_connect_direct() returning, and
phydrv::config_aneg() being called.

> 
> My concern is that this problem occurs only on an at8031 chip, so having my
> network driver passing an at8031-specific flag seems out of place.  What
> happens if, on some other board, a different PHY is used, and that flag
> means something else?

There needs to be an agreement on what the flags bits mean, and this
needs to be in a shared header file that other network drivers can
reference and where they can allocate their own bits if existing
functionality is not covered. The allocation of such flags is centered
around the perspective of the PHY driver.

Of course, this only works if you have a strict mapping between the
Ethernet MAC and the PHY, and if your MAC only uses the same PHY device
driver. If that's not the case, you need to make sure you don't pass a
phy_flags with bits set that could influence the behavior of another PHY
driver that would also try to do something with these phy_flags. Fairly
positive you can figure this out from your Ethernet MAC driver.

> 
>> - register a PHY fixup which is specific to the board/SoC, and have the
>> PHY fixup do whatever is necessary for your platform (like setting
>> specific registers)
> 
> Do you have an example of that?

You could have grepped for phy_register_fixup() but the best example I
can come up with is:

drivers/net/usb/lan78xx.c

and there are a lot more in arch/{arm,powerpc}/:

arch/arm/mach-davinci/board-dm644x-evm.c:
phy_register_fixup_for_uid(LXT971_PHY_ID, LXT971_PHY_MASK,
arch/arm/mach-imx/mach-imx6q.c:
phy_register_fixup_for_uid(PHY_ID_KSZ9021, MICREL_PHY_ID_MASK,
arch/arm/mach-imx/mach-imx6q.c:
phy_register_fixup_for_uid(PHY_ID_KSZ9031, MICREL_PHY_ID_MASK,
arch/arm/mach-imx/mach-imx6q.c:
phy_register_fixup_for_uid(PHY_ID_AR8031, 0xffffffef,
arch/arm/mach-imx/mach-imx6q.c:
phy_register_fixup_for_uid(PHY_ID_AR8035, 0xffffffef,
arch/arm/mach-imx/mach-imx6sx.c:
phy_register_fixup_for_uid(PHY_ID_AR8031, 0xffffffff,
arch/arm/mach-imx/mach-imx6ul.c:
phy_register_fixup_for_uid(PHY_ID_KSZ8081, MICREL_PHY_ID_MASK,
arch/arm/mach-imx/mach-imx7d.c:
phy_register_fixup_for_uid(PHY_ID_AR8031, 0xffffffff,
arch/arm/mach-imx/mach-imx7d.c:
phy_register_fixup_for_uid(PHY_ID_BCM54220, 0xffffffff,
arch/arm/mach-mxs/mach-mxs.c:
phy_register_fixup_for_uid(PHY_ID_KSZ8051, MICREL_PHY_ID_MASK,
arch/arm/mach-orion5x/dns323-setup.c:
phy_register_fixup_for_uid(MARVELL_PHY_ID_88E1118,
arch/powerpc/platforms/85xx/mpc85xx_mds.c:
phy_register_fixup_for_id(phy_id, mpc8568_fixup_125_clock);
arch/powerpc/platforms/85xx/mpc85xx_mds.c:
phy_register_fixup_for_id(phy_id, mpc8568_mds_phy_fixups);
arch/powerpc/platforms/85xx/mpc85xx_mds.c:
phy_register_fixup_for_id(phy_id, mpc8568_mds_phy_fixups);
Timur Tabi May 23, 2017, 3:54 p.m. UTC | #21
On 05/22/2017 04:32 PM, Andrew Lunn wrote:
>> I'll have to test this, but what do I do if I don't get another interrupt?
> It probably means interrupts cannot be used. Poll it.

I will test that to see what happens, but I believe the real problem is that
the at803x driver is lying when it says that the link is not okay.  I think
the link is okay, and that's why I'm not getting any more interrupts.  I
don't think I should have to drop interrupt support in my MAC driver because
one specific PHY driver is broken.
Andrew Lunn May 23, 2017, 4:07 p.m. UTC | #22
On Tue, May 23, 2017 at 10:54:57AM -0500, Timur Tabi wrote:
> On 05/22/2017 04:32 PM, Andrew Lunn wrote:
> >> I'll have to test this, but what do I do if I don't get another interrupt?
> > It probably means interrupts cannot be used. Poll it.
> 
> I will test that to see what happens, but I believe the real problem is that
> the at803x driver is lying when it says that the link is not okay.  I think
> the link is okay, and that's why I'm not getting any more interrupts.  I
> don't think I should have to drop interrupt support in my MAC driver because
> one specific PHY driver is broken.

If it turns out the PHY hardware is broken, the phy driver itself can
force it back to polling by setting phydev->irq to PHY_POLL in its
probe() function.

	Andrew
Timur Tabi May 23, 2017, 4:33 p.m. UTC | #23
On 05/23/2017 11:07 AM, Andrew Lunn wrote:
>> > I will test that to see what happens, but I believe the real problem is that
>> > the at803x driver is lying when it says that the link is not okay.  I think
>> > the link is okay, and that's why I'm not getting any more interrupts.  I
>> > don't think I should have to drop interrupt support in my MAC driver because
>> > one specific PHY driver is broken.
> If it turns out the PHY hardware is broken, the phy driver itself can
> force it back to polling by setting phydev->irq to PHY_POLL in its
> probe() function.

I don't think the hardware is broken, I think the driver is broken.  The
patch that sets aneg_done to 0 should be reverted or restricted somehow.

Even the developer of the patch admits that if the warning message is
displayed, the link will appear to be up, but no packets will go through.
Perhaps that's because the driver is returning 0 instead of BMSR_ANEGCOMPLETE?

Would it be okay for the PHY driver to query a property from the device tree
directly (e.g. "qca,check-sgmii-link"), and if present, only then implement
the sgmii link check?  So in at803x_probe(), I would do something like this:

	if (device_property_read_bool(&phydev->mdio.dev,
					"qca,check-sgmii-link")
		priv->check_sgmii_link = true;
Matthias May May 24, 2017, 7:18 a.m. UTC | #24
On 23/05/17 18:33, Timur Tabi wrote:
> On 05/23/2017 11:07 AM, Andrew Lunn wrote:
>>>> I will test that to see what happens, but I believe the real problem is that
>>>> the at803x driver is lying when it says that the link is not okay.  I think
>>>> the link is okay, and that's why I'm not getting any more interrupts.  I
>>>> don't think I should have to drop interrupt support in my MAC driver because
>>>> one specific PHY driver is broken.
>> If it turns out the PHY hardware is broken, the phy driver itself can
>> force it back to polling by setting phydev->irq to PHY_POLL in its
>> probe() function.
> 
> I don't think the hardware is broken, I think the driver is broken.  The
> patch that sets aneg_done to 0 should be reverted or restricted somehow.
> 
> Even the developer of the patch admits that if the warning message is
> displayed, the link will appear to be up, but no packets will go through.
> Perhaps that's because the driver is returning 0 instead of BMSR_ANEGCOMPLETE?
> 
> Would it be okay for the PHY driver to query a property from the device tree
> directly (e.g. "qca,check-sgmii-link"), and if present, only then implement
> the sgmii link check?  So in at803x_probe(), I would do something like this:
> 
> 	if (device_property_read_bool(&phydev->mdio.dev,
> 					"qca,check-sgmii-link")
> 		priv->check_sgmii_link = true;
> 

Zefir is currently on holiday and will probably get these emails when he gets back around 1. June

I think you missunderstand what "the link appears up but no packets go through means".
There are 2 pages which each reflect a side of the PHY:
* copper side
* SGMII side
Until the patch only the copper side was ever considered when reporting the state of the link.
This lead to the situation that the copper side was up (and the link reported up), but the SGMII side didn't come up.
It could well be a bad combination of CPU and the AR8031/33.
We know of at least 1 CPU (Freescale P1010) which has erratas for certain revisions regarding this.
If the SGMII side doesn't come, well no packets can go through.

With the patch: When the copper side is seen as up, it also checks if aneg of the SGMII link is done.
As far as i know SGMII can not be run without aneg, since it is always Gbit with aneg mandatory.
If SGMII aneg is not done, then, well aneg is not done and thus 0 is returned.

Internally we have this patch extended so we don't only report that aneg is not done but also reset the link.
Eventually aneg on the SGMII side can be completed and the link comes up.


Why do you think that frames are able to go through when aneg is reported as not done by the PHY?
Since aneg is mandatory for SGMII this can as well be seen as "link not up", not?

BR
Matthias
Timur Tabi May 24, 2017, 1:29 p.m. UTC | #25
On 5/24/17 2:18 AM, Matthias May wrote:
> With the patch: When the copper side is seen as up, it also checks if aneg of the SGMII link is done.
> As far as i know SGMII can not be run without aneg, since it is always Gbit with aneg mandatory.
> If SGMII aneg is not done, then, well aneg is not done and thus 0 is returned.
> 
> Internally we have this patch extended so we don't only report that aneg is not done but also reset the link.
> Eventually aneg on the SGMII side can be completed and the link comes up.

I would really like to test this patch.

> Why do you think that frames are able to go through when aneg is reported as not done by the PHY?

I have two theories:

1. The warning message is bogus.  The link actually is okay, but the 
driver thinks that it isn't.

2. The link is not okay, but it automatically fixes itself soon after 
the at803x_aneg_done() finishes.

> Since aneg is mandatory for SGMII this can as well be seen as "link not up", not?

The problem is that even though the link is up, the driver has returned 
"0", so the kernel thinks that autonegotiation has not finished. 
at803x_aneg_done() is never called again, and so I think the kernel is 
disabling the interface is some secret way.
Andrew Lunn May 24, 2017, 1:40 p.m. UTC | #26
> >Since aneg is mandatory for SGMII this can as well be seen as "link not up", not?
> 
> The problem is that even though the link is up, the driver has
> returned "0", so the kernel thinks that autonegotiation has not
> finished.

You need to prove this, that the link is not up. Any by link, we mean
both the copper and the SGMII link. 

> at803x_aneg_done() is never called again, and so I think
> the kernel is disabling the interface is some secret way.

Well, the driver has told the core that the link is not up. So the
kernel is waiting for another interrupt indicating the link has gone
up. And probably, this second interrupt never happens.

And it is not disabling the interface. Since the PHY is still down,
the core has not called netif_carrier_on().

    Andrew
Timur Tabi May 24, 2017, 1:48 p.m. UTC | #27
On 5/24/17 8:40 AM, Andrew Lunn wrote:

> You need to prove this, that the link is not up. Any by link, we mean
> both the copper and the SGMII link.

I can post the log of my iperf run showing that the, when 
at803x_aneg_done() returns zero, no packets can go through.  And then 
after I change at803x_aneg_done() so that it returns BMSR_ANEGCOMPLETE, 
then packets do go through.  Is that the proof you're looking for?

The current work-around that we're using internally is to blacklist the 
at803x driver.  This forces the kernel to use the genphy driver instead. 
  Everything works when we do this.

>> at803x_aneg_done() is never called again, and so I think
>> the kernel is disabling the interface is some secret way.
> 
> Well, the driver has told the core that the link is not up. So the
> kernel is waiting for another interrupt indicating the link has gone
> up. And probably, this second interrupt never happens.

Exxactly.  That's because the link IS up, and so there is no opportunity 
to receive another interrupt.

> And it is not disabling the interface. Since the PHY is still down,
> the core has not called netif_carrier_on().

Ok, I should have said "not enabled" instead of "disabled".  Thanks.
Andrew Lunn May 24, 2017, 2:09 p.m. UTC | #28
On Wed, May 24, 2017 at 08:48:04AM -0500, Timur Tabi wrote:
> On 5/24/17 8:40 AM, Andrew Lunn wrote:
> 
> >You need to prove this, that the link is not up. Any by link, we mean
> >both the copper and the SGMII link.
> 
> I can post the log of my iperf run showing that the, when
> at803x_aneg_done() returns zero, no packets can go through.  And
> then after I change at803x_aneg_done() so that it returns
> BMSR_ANEGCOMPLETE, then packets do go through.  Is that the proof
> you're looking for?

No. I would like to see the status of the copper side and the status
of the SGMII side, at the point at803x_aneg_done() is called.

If the copper side is up, but the SGMII side is down, returning 0 is
correct.

> Exxactly.  That's because the link IS up, and so there is no
> opportunity to receive another interrupt.

It could be, the copper side is up, but the SGMII side is down, at the
point at803x_aneg_done() is called. So it is correctly returning
0. Sometime later the SGMII side goes up, but there is not a second
interrupt. Hence the phy core does not know that the full, 2 stage MAC
to PHY to peer PHY link is now up.

	Andrew
Timur Tabi May 24, 2017, 6:58 p.m. UTC | #29
On 05/24/2017 09:09 AM, Andrew Lunn wrote:
> It could be, the copper side is up, but the SGMII side is down, at the
> point at803x_aneg_done() is called. So it is correctly returning
> 0. Sometime later the SGMII side goes up, but there is not a second
> interrupt. Hence the phy core does not know that the full, 2 stage MAC
> to PHY to peer PHY link is now up.

Ok, I'm going to debug this some more.  It turns out that the MAC side of
the SGMII link can send an interrupt when it thinks that auto-negotiation is
done.  I might be able to use this.

What function should my MAC driver call when it wants the phy core to call
at803x_aneg_done again to see if autonegotiation is done?

Also, is there a way for the MAC driver to know that at803x_aneg_done()
previously returned 0, and that it needs to tell the phy core to check again?
Andrew Lunn May 24, 2017, 7:34 p.m. UTC | #30
On Wed, May 24, 2017 at 01:58:09PM -0500, Timur Tabi wrote:
> On 05/24/2017 09:09 AM, Andrew Lunn wrote:
> > It could be, the copper side is up, but the SGMII side is down, at the
> > point at803x_aneg_done() is called. So it is correctly returning
> > 0. Sometime later the SGMII side goes up, but there is not a second
> > interrupt. Hence the phy core does not know that the full, 2 stage MAC
> > to PHY to peer PHY link is now up.
> 
> Ok, I'm going to debug this some more.  It turns out that the MAC side of
> the SGMII link can send an interrupt when it thinks that auto-negotiation is
> done.  I might be able to use this.

You can use this for your board. But it still leaves the phy driver
broken for everybody else.
 
> What function should my MAC driver call when it wants the phy core to call
> at803x_aneg_done again to see if autonegotiation is done?

You want to trigger the PHY state machine. There is only one exported
API call to do this, phy_mac_interrupt(). But you are supposed to pass
the new link state. And your MAC driver has no idea of that, it does
not know if the copper side of the PHY is up.

So it might be better if you export phy_trigger_machine().

> Also, is there a way for the MAC driver to know that at803x_aneg_done()
> previously returned 0, and that it needs to tell the phy core to check again?

Not that i know of. The MAC layer is not supposed to be messing around
in the PHY layer. However, just triggering the PHY state machine
should be enough.

   Andrew
Timur Tabi May 24, 2017, 8:57 p.m. UTC | #31
On 05/24/2017 02:34 PM, Andrew Lunn wrote:
>> Ok, I'm going to debug this some more.  It turns out that the MAC side of
>> the SGMII link can send an interrupt when it thinks that auto-negotiation is
>> done.  I might be able to use this.
>
> You can use this for your board. But it still leaves the phy driver
> broken for everybody else.

Wait, I thought you said the at803x driver was not broken, since it 
returns 0 when the SGMII side of the link hasn't finished auto-negotiating?

>> What function should my MAC driver call when it wants the phy core to call
>> at803x_aneg_done again to see if autonegotiation is done?
>
> You want to trigger the PHY state machine. There is only one exported
> API call to do this, phy_mac_interrupt(). But you are supposed to pass
> the new link state. And your MAC driver has no idea of that, it does
> not know if the copper side of the PHY is up.

My NIC has a feature called autopolling where it takes over the MDIO bus 
and regularly polls the link state.  When it detects that the link state 
has changed, it generates a MAC interrupt.  This is when I call 
phy_mac_interrupt() normally.

I think I can use the SGMII interrupt to also call phy_mac_interrupt(). 
The problem is the from the copper side, the link is already up, so if I 
call phy_mac_interrupt() again and say the link is up, the phy core is 
going to ignore that.

> So it might be better if you export phy_trigger_machine().

I'll test that, but that does seem a bit hackish.

>> Also, is there a way for the MAC driver to know that at803x_aneg_done()
>> previously returned 0, and that it needs to tell the phy core to check again?
>
> Not that i know of. The MAC layer is not supposed to be messing around
> in the PHY layer. However, just triggering the PHY state machine
> should be enough.

Can you tell my how PHY_HAS_INTERRUPT is supposed to work?  How does the 
PHY send an interrupt?

I'm starting to think that my NIC's autopolling feature is not 
compatible with phylib, and that I should use polling mode.
Andrew Lunn May 24, 2017, 9:15 p.m. UTC | #32
On Wed, May 24, 2017 at 03:57:06PM -0500, Timur Tabi wrote:
> On 05/24/2017 02:34 PM, Andrew Lunn wrote:
> >>Ok, I'm going to debug this some more.  It turns out that the MAC side of
> >>the SGMII link can send an interrupt when it thinks that auto-negotiation is
> >>done.  I might be able to use this.
> >
> >You can use this for your board. But it still leaves the phy driver
> >broken for everybody else.
> 
> Wait, I thought you said the at803x driver was not broken, since it
> returns 0 when the SGMII side of the link hasn't finished
> auto-negotiating?

It is correct so far. But to work, it needs to interrupt again once
the SGMII side has come up. Only then have we link.

> My NIC has a feature called autopolling where it takes over the MDIO
> bus and regularly polls the link state.  When it detects that the
> link state has changed, it generates a MAC interrupt.  This is when
> I call phy_mac_interrupt() normally.

Unfortunately, you need to keep this feature turned off. It will not
respect the phydev mutex. It has no idea what page has been currently
selected. It probably has no way to flip the page and see if the SGMII
link is up. etc.

> Can you tell my how PHY_HAS_INTERRUPT is supposed to work?  How does
> the PHY send an interrupt?

Generally, the PHY interrupt pin is connected to a GPIO. You then use
the GPIO as an interrupt source. So it has an interrupt number. Put
that in phydev->irq, eg using the interrupts property in device tree.
The core will register an interrupt handler, and enable the
interrupt. When it receives an interrupt, it calls the phy driver to
service the interrupt.

	Andrew
Florian Fainelli May 24, 2017, 9:19 p.m. UTC | #33
On 05/24/2017 01:57 PM, Timur Tabi wrote:
> On 05/24/2017 02:34 PM, Andrew Lunn wrote:
>>> Ok, I'm going to debug this some more.  It turns out that the MAC
>>> side of
>>> the SGMII link can send an interrupt when it thinks that
>>> auto-negotiation is
>>> done.  I might be able to use this.
>>
>> You can use this for your board. But it still leaves the phy driver
>> broken for everybody else.
> 
> Wait, I thought you said the at803x driver was not broken, since it
> returns 0 when the SGMII side of the link hasn't finished auto-negotiating?
> 
>>> What function should my MAC driver call when it wants the phy core to
>>> call
>>> at803x_aneg_done again to see if autonegotiation is done?
>>
>> You want to trigger the PHY state machine. There is only one exported
>> API call to do this, phy_mac_interrupt(). But you are supposed to pass
>> the new link state. And your MAC driver has no idea of that, it does
>> not know if the copper side of the PHY is up.
> 
> My NIC has a feature called autopolling where it takes over the MDIO bus
> and regularly polls the link state.  When it detects that the link state
> has changed, it generates a MAC interrupt.  This is when I call
> phy_mac_interrupt() normally.
> 
> I think I can use the SGMII interrupt to also call phy_mac_interrupt().
> The problem is the from the copper side, the link is already up, so if I
> call phy_mac_interrupt() again and say the link is up, the phy core is
> going to ignore that.

The question is what the HW is auto-polling on? Most HW implementations
that I have seen either auto-poll and assume that BMSR.LINKSTATUS will
reflect what they want to, or they even let you specify which register
and bit(s) to monitor for link status. So which one is it here?

As Andrew already responded, this clashes with any reads/writes that the
PHY library could do, unless your HW is smart enough to serialize MDIO
reads/writes?

> 
>> So it might be better if you export phy_trigger_machine().
> 
> I'll test that, but that does seem a bit hackish.
> 
>>> Also, is there a way for the MAC driver to know that at803x_aneg_done()
>>> previously returned 0, and that it needs to tell the phy core to
>>> check again?
>>
>> Not that i know of. The MAC layer is not supposed to be messing around
>> in the PHY layer. However, just triggering the PHY state machine
>> should be enough.
> 
> Can you tell my how PHY_HAS_INTERRUPT is supposed to work?  How does the
> PHY send an interrupt?

PHY_HAS_INTERRUPT indicates that you have a dedicated interrupt line for
your PHY device and that you have the ability to implement a custom
interrupt handling callback in the PHY driver (ack_interrupt) that lets
you deal with all possible sorts of the interrupts that the PHY could
generate.

> 
> I'm starting to think that my NIC's autopolling feature is not
> compatible with phylib, and that I should use polling mode.

That's pretty much what Andrew told you about 5 emails ago...
Timur Tabi May 24, 2017, 9:20 p.m. UTC | #34
On 05/24/2017 04:15 PM, Andrew Lunn wrote:
>> My NIC has a feature called autopolling where it takes over the MDIO
>> bus and regularly polls the link state.  When it detects that the
>> link state has changed, it generates a MAC interrupt.  This is when
>> I call phy_mac_interrupt() normally.

> Unfortunately, you need to keep this feature turned off. It will not
> respect the phydev mutex. It has no idea what page has been currently
> selected. It probably has no way to flip the page and see if the SGMII
> link is up. etc.

phydev mutex?  And what do you mean by page?

I forgot one detail.  Every time you do an MDIO read/write, it 
temporarily disables the feature.  Although, I think that's not relevant 
to your point.

Disabling this feature and switching from PHY_IGNORE_INTERRUPT to 
PHY_POLL might fix everything.  I will try it.
Florian Fainelli May 24, 2017, 9:28 p.m. UTC | #35
On 05/24/2017 02:20 PM, Timur Tabi wrote:
> On 05/24/2017 04:15 PM, Andrew Lunn wrote:
>>> My NIC has a feature called autopolling where it takes over the MDIO
>>> bus and regularly polls the link state.  When it detects that the
>>> link state has changed, it generates a MAC interrupt.  This is when
>>> I call phy_mac_interrupt() normally.
> 
>> Unfortunately, you need to keep this feature turned off. It will not
>> respect the phydev mutex. It has no idea what page has been currently
>> selected. It probably has no way to flip the page and see if the SGMII
>> link is up. etc.
> 
> phydev mutex?  And what do you mean by page?

Yes phydev->lock which is used to serialize the state machine state changes.

Most PHYs have many more registers than the 15 standard exposed
directly, and so you need indirect reads/writes to access these
registers, which typically involve switching a particular page, doing
the indirect register access, and then flipping the page back. If you
interrupt that scheme one way or another, your reads and writes are all
messed up.

> 
> I forgot one detail.  Every time you do an MDIO read/write, it
> temporarily disables the feature.  Although, I think that's not relevant
> to your point.

Is that done by the HW itself, or is this under SW control exclusively.

> 
> Disabling this feature and switching from PHY_IGNORE_INTERRUPT to
> PHY_POLL might fix everything.  I will try it.
> 

Humm yes, that seems like a worthwhile exercise at least.
Timur Tabi May 24, 2017, 9:32 p.m. UTC | #36
On 05/24/2017 04:28 PM, Florian Fainelli wrote:

> Yes phydev->lock which is used to serialize the state machine state changes.
>
> Most PHYs have many more registers than the 15 standard exposed
> directly, and so you need indirect reads/writes to access these
> registers, which typically involve switching a particular page, doing
> the indirect register access, and then flipping the page back. If you
> interrupt that scheme one way or another, your reads and writes are all
> messed up.

Ah, and the at803x is a device like that.

At worst, the autopoll feature could read a register from the wrong 
page, and think that the link state has changed when it hasn't.  But 
that's still bad, and all my problems do revolve around link states.

>> I forgot one detail.  Every time you do an MDIO read/write, it
>> temporarily disables the feature.  Although, I think that's not relevant
>> to your point.
>
> Is that done by the HW itself, or is this under SW control exclusively.

Software.
Florian Fainelli May 24, 2017, 9:36 p.m. UTC | #37
On 05/24/2017 02:32 PM, Timur Tabi wrote:
> On 05/24/2017 04:28 PM, Florian Fainelli wrote:
> 
>> Yes phydev->lock which is used to serialize the state machine state
>> changes.
>>
>> Most PHYs have many more registers than the 15 standard exposed
>> directly, and so you need indirect reads/writes to access these
>> registers, which typically involve switching a particular page, doing
>> the indirect register access, and then flipping the page back. If you
>> interrupt that scheme one way or another, your reads and writes are all
>> messed up.
> 
> Ah, and the at803x is a device like that.
> 
> At worst, the autopoll feature could read a register from the wrong
> page, and think that the link state has changed when it hasn't.  But
> that's still bad, and all my problems do revolve around link states.

Absolutely, just like it's not clear whether autopoll does read
BMSR.LINKSTAT or another at803x specific register? Also how is
BMSR.LINKSTAT defined on at803x when there is SGMII + Copper involved?

> 
>>> I forgot one detail.  Every time you do an MDIO read/write, it
>>> temporarily disables the feature.  Although, I think that's not relevant
>>> to your point.
>>
>> Is that done by the HW itself, or is this under SW control exclusively.
> 
> Software.

OK, and there is no way you can run into the following race condition:

CPU				HW
MDIO read intent
				polling starts
disable HW autopoll
				polling continues
MDIO read is done
				MDIO read done
				polling stops
MDIO read value returned


if you disable autopolling in HW this is guaranteed to immediately stop
by the time the register value is seen in HW and your I/O read/write
returns?
Timur Tabi May 24, 2017, 10:03 p.m. UTC | #38
On 05/24/2017 04:36 PM, Florian Fainelli wrote:
> OK, and there is no way you can run into the following race condition:
>
> CPU				HW
> MDIO read intent
> 				polling starts
> disable HW autopoll
> 				polling continues

Disabling of the HW autopoll waits for the poll to actually stop before 
continuing.  You can see the code here:

http://elixir.free-electrons.com/linux/latest/source/drivers/net/ethernet/qualcomm/emac/emac-phy.c#L102

> MDIO read is done
> 				MDIO read done
> 				polling stops
> MDIO read value returned
>
>
> if you disable autopolling in HW this is guaranteed to immediately stop
> by the time the register value is seen in HW and your I/O read/write
> returns?

It doesn't immediately stop, but the emac_phy_mdio_autopoll_disable() 
function waits for the MDIO bus to not be busy.  But low-level details 
of this feature are not documented, so who knows what it does exactly? 
The original code that used this feature only supported one PHY and 
never expected there to be any asynchronous MDIO transactios.
Zefir Kurtisi June 1, 2017, 11:45 a.m. UTC | #39
Hello Timur, all,

sorry for being late to chime in here, as Matthias said I have been AFK while this
discussion took place.

The related patch I committed is meant to double check that the SGMII side is done
aneg when copper side is. Usually it does, but this expectation admittedly might
fail in interrupt mode. What seems to happen is: after a power-down-up cycle the
at803x restarts aneg at copper and SGMII sides. In polling mode, if you run into
the case where copper is up but the SGMII is not, you'll see the message and in
(one of) the next poll cycle(s) SGMII should be ready too and the situation
recovers. In interrupt mode, obviously once you hit into the partially done aneg,
you won't recover since no additional link-change interrupt is going to happen.

Wit that, Tamur is right claiming the double-check breaks the driver in interrupt
mode. I could think of several work-arounds to fix it, e.g.
1) check if there is an SGMII link-change interrupt source and enable it
2) when we run into partial aneg completion, temporary switch to polling mode and
back to interrupt mode when at803x_aneg_done() succeeds


Alas, we need to rewind back to whether these double-checks are required at all.
As Matthias explained, there is at least one combination of devices that have
documented issues establishing the SGMII link - in our case it was the at8031
attached to a P1010's eTSEC (errata A-004187, fixed with chip rev. 2.01). Chances
are high that our company is the only one using this exact early version of
problematic devices. Therefore, and since the problem is at ETH side, we moved our
local workarounds into the gianfar driver who on demand restarts SGMII to recover
from the stale link.

I guess we need to decide whether we generally need to handle permanent aneg
failures on the SGMII link. If we expect that it must not fail (like we assumed
until we saw it failing), I agree with Timur and support reverting of the related
commit f62265b53e. If otherwise we want this potential failure to be handled
correctly, things become arbitrary complex. Essentially, we need to handle such
PHYs as a combination of their two sides (copper + SGMII) as virtual sub-PHYs. The
phylib might support that in a future version, but for now this seems like a lot
of work required to handle a rare problem.



tl;dr: ACK with reverting f62265b53e


Cheers,
Zefir


On 05/24/2017 03:29 PM, Timur Tabi wrote:
> On 5/24/17 2:18 AM, Matthias May wrote:
>> With the patch: When the copper side is seen as up, it also checks if aneg of
>> the SGMII link is done.
>> As far as i know SGMII can not be run without aneg, since it is always Gbit with
>> aneg mandatory.
>> If SGMII aneg is not done, then, well aneg is not done and thus 0 is returned.
>>
>> Internally we have this patch extended so we don't only report that aneg is not
>> done but also reset the link.
>> Eventually aneg on the SGMII side can be completed and the link comes up.
> 
> I would really like to test this patch.
> 
>> Why do you think that frames are able to go through when aneg is reported as not
>> done by the PHY?
> 
> I have two theories:
> 
> 1. The warning message is bogus.  The link actually is okay, but the driver thinks
> that it isn't.
> 
> 2. The link is not okay, but it automatically fixes itself soon after the
> at803x_aneg_done() finishes.
> 
>> Since aneg is mandatory for SGMII this can as well be seen as "link not up", not?
> 
> The problem is that even though the link is up, the driver has returned "0", so
> the kernel thinks that autonegotiation has not finished. at803x_aneg_done() is
> never called again, and so I think the kernel is disabling the interface is some
> secret way.
>
Timur Tabi June 1, 2017, 2:48 p.m. UTC | #40
On 06/01/2017 06:45 AM, Zefir Kurtisi wrote:
> I guess we need to decide whether we generally need to handle permanent aneg
> failures on the SGMII link. If we expect that it must not fail (like we assumed
> until we saw it failing), I agree with Timur and support reverting of the related
> commit f62265b53e. If otherwise we want this potential failure to be handled
> correctly, things become arbitrary complex. Essentially, we need to handle such
> PHYs as a combination of their two sides (copper + SGMII) as virtual sub-PHYs. The
> phylib might support that in a future version, but for now this seems like a lot
> of work required to handle a rare problem.

I'm about to post a patch that removes interrupt support from the EMAC 
driver and relies on software polling of the PHY.  With this patch, we 
don't see the "link is not okay" message from that at803x driver any more.

The link state is generally more reliable now, even when the at803x 
driver doesn't complain.

My theory is that the hardware polling of the PHY is just too 
aggressive.  I think it continuously reads the PHY status register at 
maximum speed and immediately issues an interrupt when the PHY says that 
it's up.

So I think we're okay with leaving the at803x driver as-is, since we 
appear to be no longer getting any false failures.
diff mbox

Patch

diff --git a/drivers/net/phy/at803x.c b/drivers/net/phy/at803x.c
index cf74d10..a52b560 100644
--- a/drivers/net/phy/at803x.c
+++ b/drivers/net/phy/at803x.c
@@ -42,10 +42,18 @@ 
 #define AT803X_MMD_ACCESS_CONTROL		0x0D
 #define AT803X_MMD_ACCESS_CONTROL_DATA		0x0E
 #define AT803X_FUNC_DATA			0x4003
+#define AT803X_REG_CHIP_CONFIG			0x1f
+#define AT803X_BT_BX_REG_SEL			0x8000
 
 #define AT803X_DEBUG_ADDR			0x1D
 #define AT803X_DEBUG_DATA			0x1E
 
+#define AT803X_MODE_CFG_MASK			0x0F
+#define AT803X_MODE_CFG_SGMII			0x01
+
+#define AT803X_PSSR			0x11	/*PHY-Specific Status Register*/
+#define AT803X_PSSR_MR_AN_COMPLETE	0x0200
+
 #define AT803X_DEBUG_REG_0			0x00
 #define AT803X_DEBUG_RX_CLK_DLY_EN		BIT(15)
 
@@ -355,6 +363,36 @@  static void at803x_link_change_notify(struct phy_device *phydev)
 	}
 }
 
+static int at803x_aneg_done(struct phy_device *phydev)
+{
+	int ccr;
+
+	int aneg_done = genphy_aneg_done(phydev);
+	if (aneg_done != BMSR_ANEGCOMPLETE)
+		return aneg_done;
+
+	/*
+	 * in SGMII mode, if copper side autoneg is successful,
+	 * also check SGMII side autoneg result
+	 */
+	ccr = phy_read(phydev, AT803X_REG_CHIP_CONFIG);
+	if ((ccr & AT803X_MODE_CFG_MASK) != AT803X_MODE_CFG_SGMII)
+		return aneg_done;
+
+	/* switch to SGMII/fiber page */
+	phy_write(phydev, AT803X_REG_CHIP_CONFIG, ccr & ~AT803X_BT_BX_REG_SEL);
+
+	/* check if the SGMII link is OK. */
+	if (!(phy_read(phydev, AT803X_PSSR) & AT803X_PSSR_MR_AN_COMPLETE)) {
+		pr_warn("803x_aneg_done: SGMII link is not ok\n");
+		aneg_done = 0;
+	}
+	/* switch back to copper page */
+	phy_write(phydev, AT803X_REG_CHIP_CONFIG, ccr | AT803X_BT_BX_REG_SEL);
+
+	return aneg_done;
+}
+
 static struct phy_driver at803x_driver[] = {
 {
 	/* ATHEROS 8035 */
@@ -406,6 +444,7 @@  static struct phy_driver at803x_driver[] = {
 	.flags			= PHY_HAS_INTERRUPT,
 	.config_aneg		= genphy_config_aneg,
 	.read_status		= genphy_read_status,
+	.aneg_done		= at803x_aneg_done,
 	.ack_interrupt		= &at803x_ack_interrupt,
 	.config_intr		= &at803x_config_intr,
 } };