diff mbox series

[RFC,net-next,6/8] net: phylink: Configure MAC/PCS when link is up without PHY

Message ID 9a2136885d9a892ff170be88fdffeda82c778a10.1580122909.git.Jose.Abreu@synopsys.com
State RFC
Delegated to: David Miller
Headers show
Series net: Add support for Synopsys DesignWare XPCS | expand

Commit Message

Jose Abreu Jan. 27, 2020, 11:09 a.m. UTC
When we don't have any real PHY driver connected and we get link up from
PCS we shall configure MAC and PCS for the desired speed and also
resolve the flow control settings from MAC side.

Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>

---
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: Jose Abreu <joabreu@synopsys.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: netdev@vger.kernel.org
Cc: linux-stm32@st-md-mailman.stormreply.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/net/phy/phylink.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

Comments

Russell King (Oracle) Jan. 27, 2020, 11:21 a.m. UTC | #1
On Mon, Jan 27, 2020 at 12:09:11PM +0100, Jose Abreu wrote:
> When we don't have any real PHY driver connected and we get link up from
> PCS we shall configure MAC and PCS for the desired speed and also
> resolve the flow control settings from MAC side.

This is certainly the wrong place for it.  Please hold off on this patch
for the time being.  Thanks.

> 
> Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
> 
> ---
> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
> Cc: Alexandre Torgue <alexandre.torgue@st.com>
> Cc: Jose Abreu <joabreu@synopsys.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
> Cc: Russell King <linux@armlinux.org.uk>
> Cc: Andrew Lunn <andrew@lunn.ch>
> Cc: Florian Fainelli <f.fainelli@gmail.com>
> Cc: Heiner Kallweit <hkallweit1@gmail.com>
> Cc: netdev@vger.kernel.org
> Cc: linux-stm32@st-md-mailman.stormreply.com
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  drivers/net/phy/phylink.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> index 4174d874b1f7..75dbaf80d7a5 100644
> --- a/drivers/net/phy/phylink.c
> +++ b/drivers/net/phy/phylink.c
> @@ -533,10 +533,20 @@ static void phylink_resolve(struct work_struct *w)
>  
>  	if (link_changed) {
>  		pl->old_link_state = link_state.link;
> -		if (!link_state.link)
> +		if (!link_state.link) {
>  			phylink_mac_link_down(pl);
> -		else
> +		} else {
> +			/* If no PHY is connected, we still need to configure
> +			 * MAC/PCS for flow control and speed.
> +			 */
> +			if (!pl->phydev) {
> +				phylink_resolve_flow(pl, &link_state);
> +				phylink_mac_config(pl, &link_state);
> +			}
> +
>  			phylink_mac_link_up(pl, link_state);
> +		}
> +
>  	}
>  	if (!link_state.link && pl->mac_link_dropped) {
>  		pl->mac_link_dropped = false;
> -- 
> 2.7.4
> 
>
Jose Abreu Jan. 27, 2020, 11:38 a.m. UTC | #2
From: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Date: Jan/27/2020, 11:21:02 (UTC+00:00)

> On Mon, Jan 27, 2020 at 12:09:11PM +0100, Jose Abreu wrote:
> > When we don't have any real PHY driver connected and we get link up from
> > PCS we shall configure MAC and PCS for the desired speed and also
> > resolve the flow control settings from MAC side.
> 
> This is certainly the wrong place for it.  Please hold off on this patch
> for the time being.  Thanks.

This is actually the change that makes everything work ...

I need to configure PCS before Aneg is complete and then I need to 
configure MAC once Aneg is done and link is up with the outcome speed and 
flow control.

---
Thanks,
Jose Miguel Abreu
Russell King (Oracle) Jan. 27, 2020, 11:46 a.m. UTC | #3
On Mon, Jan 27, 2020 at 11:38:05AM +0000, Jose Abreu wrote:
> From: Russell King - ARM Linux admin <linux@armlinux.org.uk>
> Date: Jan/27/2020, 11:21:02 (UTC+00:00)
> 
> > On Mon, Jan 27, 2020 at 12:09:11PM +0100, Jose Abreu wrote:
> > > When we don't have any real PHY driver connected and we get link up from
> > > PCS we shall configure MAC and PCS for the desired speed and also
> > > resolve the flow control settings from MAC side.
> > 
> > This is certainly the wrong place for it.  Please hold off on this patch
> > for the time being.  Thanks.
> 
> This is actually the change that makes everything work ...
> 
> I need to configure PCS before Aneg is complete and then I need to 
> configure MAC once Aneg is done and link is up with the outcome speed and 
> flow control.

Yes, I realise that, but it comes with the expense of potentially
breaking mvneta and mvpp2, where the settings are automatically
passed between the PCS and MAC in hardware. I also believe DSA
works around this, and I need to look at that.

However, right now I'm in the middle of rebasing my git tree on
top of v5.5 and can't say much more until I've finished that.
Jose Abreu Jan. 27, 2020, 12:50 p.m. UTC | #4
From: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Date: Jan/27/2020, 11:46:00 (UTC+00:00)

> On Mon, Jan 27, 2020 at 11:38:05AM +0000, Jose Abreu wrote:
> > From: Russell King - ARM Linux admin <linux@armlinux.org.uk>
> > Date: Jan/27/2020, 11:21:02 (UTC+00:00)
> > 
> > > On Mon, Jan 27, 2020 at 12:09:11PM +0100, Jose Abreu wrote:
> > > > When we don't have any real PHY driver connected and we get link up from
> > > > PCS we shall configure MAC and PCS for the desired speed and also
> > > > resolve the flow control settings from MAC side.
> > > 
> > > This is certainly the wrong place for it.  Please hold off on this patch
> > > for the time being.  Thanks.
> > 
> > This is actually the change that makes everything work ...
> > 
> > I need to configure PCS before Aneg is complete and then I need to 
> > configure MAC once Aneg is done and link is up with the outcome speed and 
> > flow control.
> 
> Yes, I realise that, but it comes with the expense of potentially
> breaking mvneta and mvpp2, where the settings are automatically
> passed between the PCS and MAC in hardware. I also believe DSA
> works around this, and I need to look at that.

OK so there is one alternative solution for this that's just saving the 
last link status on stmmac internal structure (if applicable ofc, 
something like an_complete is true and link is true) and then just use 
that info in mac_link_up() callback to configure the MAC when PCS is in 
use.

This patch could then be dropped.

---
Thanks,
Jose Miguel Abreu
Russell King (Oracle) Jan. 27, 2020, 1:37 p.m. UTC | #5
On Mon, Jan 27, 2020 at 12:50:54PM +0000, Jose Abreu wrote:
> From: Russell King - ARM Linux admin <linux@armlinux.org.uk>
> Date: Jan/27/2020, 11:46:00 (UTC+00:00)
> 
> > On Mon, Jan 27, 2020 at 11:38:05AM +0000, Jose Abreu wrote:
> > > From: Russell King - ARM Linux admin <linux@armlinux.org.uk>
> > > Date: Jan/27/2020, 11:21:02 (UTC+00:00)
> > > 
> > > > On Mon, Jan 27, 2020 at 12:09:11PM +0100, Jose Abreu wrote:
> > > > > When we don't have any real PHY driver connected and we get link up from
> > > > > PCS we shall configure MAC and PCS for the desired speed and also
> > > > > resolve the flow control settings from MAC side.
> > > > 
> > > > This is certainly the wrong place for it.  Please hold off on this patch
> > > > for the time being.  Thanks.
> > > 
> > > This is actually the change that makes everything work ...
> > > 
> > > I need to configure PCS before Aneg is complete and then I need to 
> > > configure MAC once Aneg is done and link is up with the outcome speed and 
> > > flow control.
> > 
> > Yes, I realise that, but it comes with the expense of potentially
> > breaking mvneta and mvpp2, where the settings are automatically
> > passed between the PCS and MAC in hardware. I also believe DSA
> > works around this, and I need to look at that.
> 
> OK so there is one alternative solution for this that's just saving the 
> last link status on stmmac internal structure (if applicable ofc, 
> something like an_complete is true and link is true) and then just use 
> that info in mac_link_up() callback to configure the MAC when PCS is in 
> use.

I'm not disagreeing that something needs to be done - the assumption
in phylink that the MAC and PCS talk to each other is one that comes
from the hardware which it was developed on, but is not true for all
hardware.  The IEEE 802.3 model doesn't include that behaviour.

So please, don't try to come up with an alternative solution; this
problem _does_ need solving in phylink, but it needs to be done in a
way that doesn't regress the existing users.

I've already started to split the current set of MAC operations into
a purely MAC set of operations and a set of PCS operations, but still,
the problem of how to sensibly deal with mvneta and mvpp2 remain.

The problem is that both these use two registers to control both the
PCS and MAC.  One is a control register, which controls what is
advertised, whether AN is used, what is negotiated and what is forced,
including whether the link is forced up.

The other is a status register that gives the status of the MAC -
whether tx pause and/or rx pause is enabled, what speed and duplex the
MAC is running at, whether the link is in sync, whether the link is up
etc.

Essentially, the PCS and MAC are tightly integrated together in this
hardware, so splitting this into separate PCS and MAC control blocks is
very problematical.

As I say, this is something that needs solving.  A solution needs to be
found, rather than having lots of drivers working around this issue in
their own special ways, and my fear is that the more workarounds we
have, the more the phylink core will become unmaintainable.

So please, no workarounds.
Jose Abreu Jan. 27, 2020, 1:57 p.m. UTC | #6
From: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Date: Jan/27/2020, 13:37:52 (UTC+00:00)

> On Mon, Jan 27, 2020 at 12:50:54PM +0000, Jose Abreu wrote:
> > From: Russell King - ARM Linux admin <linux@armlinux.org.uk>
> > Date: Jan/27/2020, 11:46:00 (UTC+00:00)
> > 
> > > On Mon, Jan 27, 2020 at 11:38:05AM +0000, Jose Abreu wrote:
> > > > From: Russell King - ARM Linux admin <linux@armlinux.org.uk>
> > > > Date: Jan/27/2020, 11:21:02 (UTC+00:00)
> > > > 
> > > > > On Mon, Jan 27, 2020 at 12:09:11PM +0100, Jose Abreu wrote:
> > > > > > When we don't have any real PHY driver connected and we get link up from
> > > > > > PCS we shall configure MAC and PCS for the desired speed and also
> > > > > > resolve the flow control settings from MAC side.
> > > > > 
> > > > > This is certainly the wrong place for it.  Please hold off on this patch
> > > > > for the time being.  Thanks.
> > > > 
> > > > This is actually the change that makes everything work ...
> > > > 
> > > > I need to configure PCS before Aneg is complete and then I need to 
> > > > configure MAC once Aneg is done and link is up with the outcome speed and 
> > > > flow control.
> > > 
> > > Yes, I realise that, but it comes with the expense of potentially
> > > breaking mvneta and mvpp2, where the settings are automatically
> > > passed between the PCS and MAC in hardware. I also believe DSA
> > > works around this, and I need to look at that.
> > 
> > OK so there is one alternative solution for this that's just saving the 
> > last link status on stmmac internal structure (if applicable ofc, 
> > something like an_complete is true and link is true) and then just use 
> > that info in mac_link_up() callback to configure the MAC when PCS is in 
> > use.
> 
> I'm not disagreeing that something needs to be done - the assumption
> in phylink that the MAC and PCS talk to each other is one that comes
> from the hardware which it was developed on, but is not true for all
> hardware.  The IEEE 802.3 model doesn't include that behaviour.
> 
> So please, don't try to come up with an alternative solution; this
> problem _does_ need solving in phylink, but it needs to be done in a
> way that doesn't regress the existing users.
> 
> I've already started to split the current set of MAC operations into
> a purely MAC set of operations and a set of PCS operations, but still,
> the problem of how to sensibly deal with mvneta and mvpp2 remain.
> 
> The problem is that both these use two registers to control both the
> PCS and MAC.  One is a control register, which controls what is
> advertised, whether AN is used, what is negotiated and what is forced,
> including whether the link is forced up.
> 
> The other is a status register that gives the status of the MAC -
> whether tx pause and/or rx pause is enabled, what speed and duplex the
> MAC is running at, whether the link is in sync, whether the link is up
> etc.
> 
> Essentially, the PCS and MAC are tightly integrated together in this
> hardware, so splitting this into separate PCS and MAC control blocks is
> very problematical.
> 
> As I say, this is something that needs solving.  A solution needs to be
> found, rather than having lots of drivers working around this issue in
> their own special ways, and my fear is that the more workarounds we
> have, the more the phylink core will become unmaintainable.
> 
> So please, no workarounds.

I'm not trying to rush and I do agree with you. I just thought this was a 
particular case but I did tried to fix it in this commit.

I'm not familiar with mvneta HW but thanks for the description. Indeed, 
for XPCS and stmmac they are independent but MAC still needs to know the 
speed at least. My main problem here is USXGMII. This needs to know the 
speed negotiated taking into account MAC limitations and then use it to 
configure XPCS.

So, another possible solution is just to pass phylink_link_state struct to 
mac_link_up() and mac_link_down(). From the comment on the function and 
it's usage in phylink it seems this could be a good fit ?

Another thing I could solve here (while I'm at it), is PCS that sit in a 
MII bus: Instead of having stmmac saving all that private data of XPCS I 
could just pass it to phylink and save it there ? Something like you 
proposed in ("net: axienet: Fix SGMII support") ?

---
Thanks,
Jose Miguel Abreu
Andrew Lunn Jan. 27, 2020, 2 p.m. UTC | #7
> Yes, I realise that, but it comes with the expense of potentially
> breaking mvneta and mvpp2, where the settings are automatically
> passed between the PCS and MAC in hardware. I also believe DSA
> works around this, and I need to look at that.

Hi Russell

The mv88e6xxx driver has code for when SGMII is used. It transfers the
negotiated speed from the PCS to the MAC.

But it recently turned out something like this is also needed for
other link modes involving the SERDES. It used to work, i think
because Phylink would initially configure the MAC approximately right,
or the mv88e6xxx driver was looking at phylink state it should not.
But it no longer works.

I would like to see a generic solution, and would be happy to remove
the current SGMII code when you have something to replace it.

    Andrew
Russell King (Oracle) Jan. 27, 2020, 2:08 p.m. UTC | #8
On Mon, Jan 27, 2020 at 03:00:38PM +0100, Andrew Lunn wrote:
> > Yes, I realise that, but it comes with the expense of potentially
> > breaking mvneta and mvpp2, where the settings are automatically
> > passed between the PCS and MAC in hardware. I also believe DSA
> > works around this, and I need to look at that.
> 
> Hi Russell
> 
> The mv88e6xxx driver has code for when SGMII is used. It transfers the
> negotiated speed from the PCS to the MAC.
> 
> But it recently turned out something like this is also needed for
> other link modes involving the SERDES. It used to work, i think
> because Phylink would initially configure the MAC approximately right,
> or the mv88e6xxx driver was looking at phylink state it should not.
> But it no longer works.

Can you give a hint which platform this is and how to reproduce it
please?
Andrew Lunn Jan. 27, 2020, 2:51 p.m. UTC | #9
> Can you give a hint which platform this is and how to reproduce it
> please?

Hi Russell

Devel C has issues with its fibre ports. I tend to test with
sff2/port9 not sff3/port3, because i also have the copper port plugged
in. If the copper gets link before the fibre, copper is used.

What i see is that after the SERDES syncs, its registers indicate a 1G
link, full duplex, etc. But the MAC is using 10/Half. And hence no
packets go through. If i set the MAC to the same as the PCS, i can at
least transmit. Receive does not work, but i think that is something
else. The statistics counters indicate the SERDES is receiving frames,
but the MAC statistic counters suggests the MAC never sees them.

I've also had issues with the DSA links, also being configured to
10/Half. That seems to be related to having a phy-mode property in
device tree. I need to add a fixed-link property to set the correct
speed. Something is broken here, previously the fixed-link was only
needed if the speed needed to be lower than the ports maximum. I think
that is a separate issue i need to dig into, not part of the PCS to
MAC transfer.

Heiner has another device which has an Aquantia PHY running in an odd
mode so that it does 1G over a T2 link. It uses SGMII for this, and
that is where we first noticed the issue of the MAC and PCS having
different configurations.

      Andrew
Russell King (Oracle) Jan. 27, 2020, 4:11 p.m. UTC | #10
On Mon, Jan 27, 2020 at 03:51:07PM +0100, Andrew Lunn wrote:
> > Can you give a hint which platform this is and how to reproduce it
> > please?
> 
> Hi Russell
> 
> Devel C has issues with its fibre ports. I tend to test with
> sff2/port9 not sff3/port3, because i also have the copper port plugged
> in. If the copper gets link before the fibre, copper is used.
> 
> What i see is that after the SERDES syncs, its registers indicate a 1G
> link, full duplex, etc. But the MAC is using 10/Half. And hence no
> packets go through. If i set the MAC to the same as the PCS, i can at
> least transmit. Receive does not work, but i think that is something
> else. The statistics counters indicate the SERDES is receiving frames,
> but the MAC statistic counters suggests the MAC never sees them.
> 
> I've also had issues with the DSA links, also being configured to
> 10/Half. That seems to be related to having a phy-mode property in
> device tree. I need to add a fixed-link property to set the correct
> speed. Something is broken here, previously the fixed-link was only
> needed if the speed needed to be lower than the ports maximum. I think
> that is a separate issue i need to dig into, not part of the PCS to
> MAC transfer.

Presumably, all these should be visible on the ZII rev B as well?
I've not noticed any issues there, and I have 5.4 built from my
tree on December 22nd which would've included most of what is in
5.5, and quite a bit of what's queued in net-next.

There, I see:

mv88e6xxx.0/regs:    GLOBAL GLOBAL2 SERDES     0    1    2    3    4    5    6
mv88e6xxx.0/regs: 1:     2    8007     149     3    3    3    3    3 403e   3d
mv88e6xxx.1/regs:    GLOBAL GLOBAL2 SERDES     0    1    2    3    4    5    6
mv88e6xxx.1/regs: 1:     2    8807     14d     3    3    3    3    3 403e 403e
mv88e6xxx.2/regs:    GLOBAL GLOBAL2 SERDES     0    1    2    3    4    5    6    7    8    9
regs: 1:  7209       0    ffff  c503 c503 c503 2403 2403 2403 2403 2403 2403 c13e

which looks fine to me:
- switch 0
   - port 5 is the DSA port, which is forced to 1G.
   - port 6 is the CPU port, which is forced to 100M.
- switch 1
   - ports 5 and 6 are DSA ports, forced to 1G
- switch 2
   - port 9 is the DSA port, forced to 1G.

Booting 5.5 is more noisy than 5.4 - there's loads of complaints about
"already a member of VLAN 1".  As far as the port MAC settings go, it
looks just the same as the 5.4 settings I quoted above.

Now, I do have some differences between what is in mainline and my tree
and one of them involves adding a whole bunch of "phylink_mac_config"
and "phylink_link_force" methods to the mv88e6xxx_ops for Marvell DSA
switches.  Without these, dsa_port_phylink_mac_config() will ignore
phylink's attempts to configure the MAC.

Quite why this is, I don't know; these are patches I've carried for
ages, since trying to get the SFF modules working on these platforms,
before mainline gained phylink support for DSA.  I seem to remember
that mainline's work was based on what I'd done, or was very similar,
but I never really understood why bits such as this were left out.
Since this work has been published online in my git tree since day 1,
I find it really strange that people go off and do what seems to be a
half-hearted implementation.  See the "zii" branch.

Mainline did diverge on the issue of how the SFF modules should be
driven; whether to drive them with the SFP code or whether to use
a fixed-link instead.  I've kept my original approach, which is less
than perfect since we don't have a link interrupt to trigger the call
to phylink_mac_change().  However, I'm suspecting that once we solve
the PCS/MAC split issue, and use the clause 37 phylink PCS helpers
I've proposed in the last few weeks, this will be resolvable.

> Heiner has another device which has an Aquantia PHY running in an odd
> mode so that it does 1G over a T2 link. It uses SGMII for this, and
> that is where we first noticed the issue of the MAC and PCS having
> different configurations.

Do you know when the issue appeared?

It sounds like this regression has been known for some time, yet this
is the first I've heard about it.
Andrew Lunn Jan. 27, 2020, 4:22 p.m. UTC | #11
> > Heiner has another device which has an Aquantia PHY running in an odd
> > mode so that it does 1G over a T2 link. It uses SGMII for this, and
> > that is where we first noticed the issue of the MAC and PCS having
> > different configurations.
> 
> Do you know when the issue appeared?

As far as i understand, it never worked, it is not a regression as
such. But Heiner probably knows more.

      Andrew
Andrew Lunn Jan. 27, 2020, 4:32 p.m. UTC | #12
> Presumably, all these should be visible on the ZII rev B as well?

Maybe. The two SFF mounted on most rev B are connected to ports which
only do SGMII, not 1000Base X. They tend to work by chance, and as
such, i've never taken them seriously.

If i remember correctly, you modified your board, moved the SFF over
to the normally unpopulated slots, and removed a resistor. That setup
then has the SFF connected to the 6352, which can do both SGMII and
1000BaseX.

It could also be that the 6352 does have pass through from the PCS to
the MAC, where as the 6390 does not? The 6390 is much more capable,
having 2.5G and 10G support. The SERDES registers are very different,
C45 vs C22 of the 6352.

    Andrew
Russell King (Oracle) Jan. 27, 2020, 5:56 p.m. UTC | #13
On Mon, Jan 27, 2020 at 05:32:41PM +0100, Andrew Lunn wrote:
> > Presumably, all these should be visible on the ZII rev B as well?
> 
> Maybe. The two SFF mounted on most rev B are connected to ports which
> only do SGMII, not 1000Base X. They tend to work by chance, and as
> such, i've never taken them seriously.
> 
> If i remember correctly, you modified your board, moved the SFF over
> to the normally unpopulated slots, and removed a resistor. That setup
> then has the SFF connected to the 6352, which can do both SGMII and
> 1000BaseX.

Yes, I modified the board to fix a design mistake - removing R412.
The SFF are where they are when they were delivered:

OPT P1 - no module fitted, and the serdes signals are not routed.
	 This might as well not exist.

OPT P2 - Cotsworks SFBG53DRAP.
	 This is connected to port 4 of switch 0, one of the 88E6352.

The 88E6352 can have the serdes block associated with either port 4 or
port 5 depending on the state of the S_SEL signal.  The serdes will be
associated with port 4 if S_SEL is low at reset, and with port 5 if
S_SEL is high at reset.

88E6352 Port 4 RGMII signals are not used.  Port 5 RGMII is used to
connect to the next 88E6352 switch.  So, if the serdes is associated
with port 5, and if RGMII is used, it prevents the use of the serdes.

With R412 fitted, S_SEL is pulled high, and assocates the serdes with
port 5, and hence is unusable.  When R412 is removed, the serdes is
associated with port 4, and can be configured for either SGMII or
1000baseX mode via the PHY detect bit.

So, the ZII rev B, OPT P2 only becomes useful if R412 is removed.

OPT P3 - Cotsworks SFBG53DRAP
	 This is connected to port 3 of switch 2, one of the 88E6185.
OPT P4 - AVAGO AFBR-59R5ALZ
	 This is connected to port 4 of switch 2, one of the 88E6185.

The 88E6185 can only have ports 7, 8 or 9 configured for 1000BASE-X
mode.  These two ports end up configured for cross-chip serdes mode
which is 1000BASE-X but with manually controlled link status, as
this mode is designed to link two 88E6185 to each other (hence
"cross-chip").  There appears to be no accessible serdes block on this
device to give us any interrupts.

With my suggestion for a polling mode in phylink, it may be possible
to get OPT P3 and OPT P4 working.

> It could also be that the 6352 does have pass through from the PCS to
> the MAC, where as the 6390 does not? The 6390 is much more capable,
> having 2.5G and 10G support. The SERDES registers are very different,
> C45 vs C22 of the 6352.

My feeling is that the issues you're seeing with the ZII rev C come
down to the phylink implementation for MV88E6xxx lacking some of the
necessary support, and this has probably been broken ever since
phylink was introduced into the mainline MV88E6xxx driver.

Try

http://git.armlinux.org.uk/cgit/linux-arm.git/patch/?id=eb717ca455b1ae425a4d4b60615ba3e4d0ba35d4

which will be 5.4 based; I haven't pushed out my 5.5 based tree yet
as I'm busy writing emails rather than testing it, and running out
of time to do so before tomorrow!
Heiner Kallweit Jan. 27, 2020, 7:28 p.m. UTC | #14
On 27.01.2020 17:22, Andrew Lunn wrote:
>>> Heiner has another device which has an Aquantia PHY running in an odd
>>> mode so that it does 1G over a T2 link. It uses SGMII for this, and
>>> that is where we first noticed the issue of the MAC and PCS having
>>> different configurations.
>>
>> Do you know when the issue appeared?
> 
> As far as i understand, it never worked, it is not a regression as
> such. But Heiner probably knows more.
> 
I think you're referring to the issue that was fixed with following
commit: 72d8b4fdbfb6 ("net: dsa: mv88e6xxx: support in-band signalling
on SGMII ports with external PHYs"). The commit description also has a
link to the discussion we had about the issue. If I read it correctly
the issue is independent of this proprietary 1000BaseT2 mode having
been used.

>       Andrew
> .
> 
Heiner
Jose Abreu Jan. 28, 2020, 11:12 a.m. UTC | #15
Hi Russell,

Please check bellow for another possibility. On this one I moved almost 
everything to PHYLINK, except the HW related logic which is still in XPCS 
module.

https://github.com/joabreu/linux/commits/stmmac-next

Thanks,
Jose Miguel Abreu
Russell King (Oracle) Jan. 28, 2020, 6:08 p.m. UTC | #16
On Mon, Jan 27, 2020 at 03:51:07PM +0100, Andrew Lunn wrote:
> I've also had issues with the DSA links, also being configured to
> 10/Half. That seems to be related to having a phy-mode property in
> device tree. I need to add a fixed-link property to set the correct
> speed. Something is broken here, previously the fixed-link was only
> needed if the speed needed to be lower than the ports maximum. I think
> that is a separate issue i need to dig into, not part of the PCS to
> MAC transfer.

I think I understand what is happening on this one more fully.

When DSA initialises, the DSA and CPU ports are initially configured to
maximum speed via mv88e6xxx_setup_port(), called via mv88e6xxx_setup(),
the .setup method, dsa_switch_setup(), and dsa_tree_setup_switches().

dsa_tree_setup_switches() then moves on to calling dsa_port_setup().
dsa_port_setup() calls dsa_port_link_register_of() for the DSA and CPU
ports, which calls into dsa_port_phylink_register().

That calls phylink_create(), and then attempts to attach a PHY using
phylink_of_phy_connect() - which itself is rather weird - since when
has a DSA or CPU port been allowed to have a PHY in its DT node?

The upshot is, phylink_create() will (and always has) treated a node
without a fixed-link or in-band specification as a "phy" mode link.
Moving on, phylink_start() will be called.

phylink_start() attempts to set an initial configuration.  As there
is no PHY attached, phylink has no idea what parameters to set, but
it needs to set an initial configuration, so it does so.  The result
is, dsa_port_phylink_mac_config() gets called without the speed and
duplex being set as one would expect.

That hasn't changed in phylink yet - so it's a bug that dates back
to the phylink integration into the DSA core, and is a regression
resulting from that.

The reason my patch above appears to solve it is because I'm ignoring
calls to mac_config() with mode == MLO_AN_PHY in various circumstances,
which results in the initial configuration by mv88e6xxx_setup_port()
remaining.

I'm not yet sure what to do about that; and I'm out of time to think
about that anymore today - but I thought I'd post my analysis so far
in the hope that it helps.
Andrew Lunn Jan. 28, 2020, 6:25 p.m. UTC | #17
On Tue, Jan 28, 2020 at 06:08:03PM +0000, Russell King - ARM Linux admin wrote:
> On Mon, Jan 27, 2020 at 03:51:07PM +0100, Andrew Lunn wrote:
> > I've also had issues with the DSA links, also being configured to
> > 10/Half. That seems to be related to having a phy-mode property in
> > device tree. I need to add a fixed-link property to set the correct
> > speed. Something is broken here, previously the fixed-link was only
> > needed if the speed needed to be lower than the ports maximum. I think
> > that is a separate issue i need to dig into, not part of the PCS to
> > MAC transfer.
> 
> I think I understand what is happening on this one more fully.
> 
> When DSA initialises, the DSA and CPU ports are initially configured to
> maximum speed via mv88e6xxx_setup_port(), called via mv88e6xxx_setup(),
> the .setup method, dsa_switch_setup(), and dsa_tree_setup_switches().
> 
> dsa_tree_setup_switches() then moves on to calling dsa_port_setup().
> dsa_port_setup() calls dsa_port_link_register_of() for the DSA and CPU
> ports, which calls into dsa_port_phylink_register().
> 
> That calls phylink_create(), and then attempts to attach a PHY using
> phylink_of_phy_connect() - which itself is rather weird - since when
> has a DSA or CPU port been allowed to have a PHY in its DT node?

Hi Russell

There are some boards which have back to back PHYs between the SoC and
the Switch. Most designs rely on strapping the PHYs to just work, no
software configuration needed. But then came along a board with a PHY
which needed a kick to make it work :-(

> That hasn't changed in phylink yet - so it's a bug that dates back
> to the phylink integration into the DSA core, and is a regression
> resulting from that.

And i think i probably did not notice it because nearly all the boards
i test with connect to the FEC, which is Fast Ethernet only, so needs
fixed-link properties.

	   Andrew
Russell King (Oracle) Feb. 4, 2020, 5:26 p.m. UTC | #18
On Tue, Jan 28, 2020 at 11:12:05AM +0000, Jose Abreu wrote:
> Hi Russell,
> 
> Please check bellow for another possibility. On this one I moved almost 
> everything to PHYLINK, except the HW related logic which is still in XPCS 
> module.
> 
> https://github.com/joabreu/linux/commits/stmmac-next

I think this is going a little too far; it has the effect of limiting
phylink to using one PCS, but a device may have multiple PCS attached
to the MAC subsystem.

I know that mvpp2's network interfaces are a bit like that - there is
the 10/100/1G/2.5G MAC+PCS subsystem, and an entirely separate 10G
MAC+XPCS/MPCS subsystem which are switched between on the fly.

Similar is true on the LX2160A (which is what I've been working on
over the last week, having dug out some additional information on it
that was hidden inside the PDFs - reference manual PDFs within other
reference manual PDFs is not a nice way to provide documentation -
unless you stumble over the internal link to the file in the outer
PDF, you have no idea that the sub-documentation even exists.)

There, there is one MAC, but there are multiple different PCS - one
for SGMII and 1000base-X, another for 10G, another for 25G, etc.
These PCS are accessed via a MDIO adapter embedded in each of the
MAC hardware blocks.

LX2160A provides some additional complexities at the moment as we
don't yet know whether it's possible to reconfigure the Serdes on
the fly to switch between the various speeds, so we can't get
dynamically switch between (SGMII, 1000base-X), 10G, 25G, 40G,
100G - but a request has been put into NXP before Christmas to see
what would be possible.

So, right now I don't like the idea that we have this probing
mechanism bolted into phylink for PCS PHYs - I think that's a
decision that the MAC driver needs to be making.

Now, you've introduced this phylink_pcs_ops thing - which is similar
to a patch that I've had in my tree for a few weeks while working on
this issue.  As I've already said to you, the issue you currently
have affects multiple people, and I've been working on solving it
in a way that *isn't* specific to one particular hardware - but with
an overview of all the problems that everyone has.

Plus, it's not like we need to rush - we're in the middle of a merge
window right now, so we have about three months before the code will
ultimately be merged into mainline.  We might as well use that time
to work out a solution that doesn't mess up someone else.

Much of the prototype stuff for the LX2160A, including some revised
patches I've sent out during January, are available in my "cex7" and
"phy" branches - but not yet the mac_ops vs pcs_ops bits, which I'm
still working on.  What is in the "cex7" branch is fairly close to
being split into separate MAC / PCS operations, but I haven't yet
moved out my MAC / PCS operations split patches yet.

Now, one of the important changes which may not be obvious from those
branches is the way we pass the link state to the MAC and PCS.
Currently, that was only available via mac_config(), which assumes an
integrated MAC / PCS solution - mac_config() will not be called
(intentionally) for in-band links where no PHY is attached, and I don't
want to change that for several reasons.  Instead, the link_up()
methods get passed the resolved state, and this is the state that a
split MAC / PCS setup should be using to configure the MAC or PCS
when the link comes up.  All fields are guaranteed to be the resolved
state, so no SPEED_UNKNOWN / DUPLEX_UNKNOWN issues unless something
has gone wrong elsewhere - which is one of the other reasons for this,
various users have been having issues due to those passed to
mac_config().

It's not production-ready yet, but I will continue working on it
over the coming week.
Andrew Lunn Feb. 4, 2020, 5:43 p.m. UTC | #19
> There, there is one MAC, but there are multiple different PCS - one
> for SGMII and 1000base-X, another for 10G, another for 25G, etc.
> These PCS are accessed via a MDIO adapter embedded in each of the
> MAC hardware blocks.

Hi Russell

Marvell mv88e6390X switches are like this is a well. There is a PCS
for SGMII and 1000Base-X, and a second one for 10G. And it dynamically
swaps between them depending on the port mode, the so called cmode.

So a generic solution is required, and please take your time to build
one.

> It's not production-ready yet, but I will continue working on it
> over the coming week.

I'm happy to test when you are ready.

Thanks for working on this,

       Andrew
Russell King (Oracle) Feb. 4, 2020, 7:32 p.m. UTC | #20
On Tue, Feb 04, 2020 at 06:43:18PM +0100, Andrew Lunn wrote:
> > There, there is one MAC, but there are multiple different PCS - one
> > for SGMII and 1000base-X, another for 10G, another for 25G, etc.
> > These PCS are accessed via a MDIO adapter embedded in each of the
> > MAC hardware blocks.
> 
> Hi Russell
> 
> Marvell mv88e6390X switches are like this is a well. There is a PCS
> for SGMII and 1000Base-X, and a second one for 10G. And it dynamically
> swaps between them depending on the port mode, the so called cmode.
> 
> So a generic solution is required, and please take your time to build
> one.

Well, DSA is quite a mixed bag...

As far as I can work out, the situation with the CPU and DSA ports is
quite hopeless - you've claimed that a change in phylink has broken it,
I can't find what that may be.  The fact is, phylink has never had any
link information for DSA links when no fixed-link property has been
specified in DT.  As I've already said in a previous email about this,
I can't see *any* sane way to fix that - but there was no response.


On a more positive note...

The mac_link_up() changes that I've talked about should work for DSA,
if only there was a reasonable way to reconfigure the ports.  If you
look at the "phy" branch, you will notice that there's a patch there -
"net: mv88e6xxx: use resolved link config in mac_link_up()" which adds
the support to configure the MAC manually.  It's rather messy, and I
see no way to deal with the pause settings.  There is support in some
Marvell DSA switches to force flow control but that's not supported
through the current mid-layer at all (port_set_pause doesn't do it.)
I'm not sure whether the "mv88e6xxx_phy_is_internal()" check there is
the right test for every DSA switch correct either.

What is missing is reading the results from the PCS (aka serdes) and
forwarding them into phylink - I did have a quick look at how that might
be possible, but the DSA code structure (consisting of multiple
mid-layers) makes it hard without rewriting quite a lot of code.  That's
fine if you know all the DSA chips inside out, but I don't - and that's
where we need someone who has the knowledge of all DSA switches that we
support.  Or, we get rid of the multiple mid-layers and switch to a
library approach, so that we can modify support for one DSA switch
without affecting everything.  It may be a simple matter of dropping the
existing serdes workaround, but I'm not sure at the moment.

I've tried this code out on the ZII rev B, I haven't tried it on the rev
C which has the 6390 switches yet.
Russell King (Oracle) Feb. 5, 2020, 12:27 p.m. UTC | #21
On Tue, Feb 04, 2020 at 07:32:30PM +0000, Russell King - ARM Linux admin wrote:
> On Tue, Feb 04, 2020 at 06:43:18PM +0100, Andrew Lunn wrote:
> > > There, there is one MAC, but there are multiple different PCS - one
> > > for SGMII and 1000base-X, another for 10G, another for 25G, etc.
> > > These PCS are accessed via a MDIO adapter embedded in each of the
> > > MAC hardware blocks.
> > 
> > Hi Russell
> > 
> > Marvell mv88e6390X switches are like this is a well. There is a PCS
> > for SGMII and 1000Base-X, and a second one for 10G. And it dynamically
> > swaps between them depending on the port mode, the so called cmode.
> > 
> > So a generic solution is required, and please take your time to build
> > one.
> 
> Well, DSA is quite a mixed bag...
> 
> As far as I can work out, the situation with the CPU and DSA ports is
> quite hopeless - you've claimed that a change in phylink has broken it,
> I can't find what that may be.  The fact is, phylink has never had any
> link information for DSA links when no fixed-link property has been
> specified in DT.  As I've already said in a previous email about this,
> I can't see *any* sane way to fix that - but there was no response.
> 
> 
> On a more positive note...
> 
> The mac_link_up() changes that I've talked about should work for DSA,
> if only there was a reasonable way to reconfigure the ports.  If you
> look at the "phy" branch, you will notice that there's a patch there -
> "net: mv88e6xxx: use resolved link config in mac_link_up()" which adds
> the support to configure the MAC manually.  It's rather messy, and I
> see no way to deal with the pause settings.  There is support in some
> Marvell DSA switches to force flow control but that's not supported
> through the current mid-layer at all (port_set_pause doesn't do it.)
> I'm not sure whether the "mv88e6xxx_phy_is_internal()" check there is
> the right test for every DSA switch correct either.
> 
> What is missing is reading the results from the PCS (aka serdes) and
> forwarding them into phylink - I did have a quick look at how that might
> be possible, but the DSA code structure (consisting of multiple
> mid-layers) makes it hard without rewriting quite a lot of code.  That's
> fine if you know all the DSA chips inside out, but I don't - and that's
> where we need someone who has the knowledge of all DSA switches that we
> support.  Or, we get rid of the multiple mid-layers and switch to a
> library approach, so that we can modify support for one DSA switch
> without affecting everything.  It may be a simple matter of dropping the
> existing serdes workaround, but I'm not sure at the moment.
> 
> I've tried this code out on the ZII rev B, I haven't tried it on the rev
> C which has the 6390 switches yet.

Well, it seems GPIO hogging with the sx1503q (for the 3310 PHY, which
is a local change) has broken sometime between v4.20 and v5.5, which
prevents the sx1503q driver probing:

[   23.378706] gpio gpiochip7: (sx1503q): setup of own GPIO 10g-rstn failed
[   23.394858] requesting hog GPIO 10g-rstn (chip sx1503q, offset 9) failed, -517
[   23.402512] gpiochip_add_data_with_key: GPIOs 480..495 (sx1503q) failed to register, -517

Without the hog, the 3310 PHY doesn't come out of reset, so I lose
port 9 on the first switch.

With that removed, I can boot, and if I bring up sff2, I see the port 9
on the second switch status report 0xef4b and control 0x303f without
fiber connected.  I'm out of time to do anything further on this today
(not even decode those), because its taken all morning to get the board
to this point, and I won't have any time tomorrow either.
Russell King (Oracle) Feb. 7, 2020, 11:21 a.m. UTC | #22
On Wed, Feb 05, 2020 at 12:27:33PM +0000, Russell King - ARM Linux admin wrote:
> On Tue, Feb 04, 2020 at 07:32:30PM +0000, Russell King - ARM Linux admin wrote:
> > On Tue, Feb 04, 2020 at 06:43:18PM +0100, Andrew Lunn wrote:
> > > > There, there is one MAC, but there are multiple different PCS - one
> > > > for SGMII and 1000base-X, another for 10G, another for 25G, etc.
> > > > These PCS are accessed via a MDIO adapter embedded in each of the
> > > > MAC hardware blocks.
> > > 
> > > Hi Russell
> > > 
> > > Marvell mv88e6390X switches are like this is a well. There is a PCS
> > > for SGMII and 1000Base-X, and a second one for 10G. And it dynamically
> > > swaps between them depending on the port mode, the so called cmode.
> > > 
> > > So a generic solution is required, and please take your time to build
> > > one.
> > 
> > Well, DSA is quite a mixed bag...
> > 
> > As far as I can work out, the situation with the CPU and DSA ports is
> > quite hopeless - you've claimed that a change in phylink has broken it,
> > I can't find what that may be.  The fact is, phylink has never had any
> > link information for DSA links when no fixed-link property has been
> > specified in DT.  As I've already said in a previous email about this,
> > I can't see *any* sane way to fix that - but there was no response.
> > 
> > 
> > On a more positive note...
> > 
> > The mac_link_up() changes that I've talked about should work for DSA,
> > if only there was a reasonable way to reconfigure the ports.  If you
> > look at the "phy" branch, you will notice that there's a patch there -
> > "net: mv88e6xxx: use resolved link config in mac_link_up()" which adds
> > the support to configure the MAC manually.  It's rather messy, and I
> > see no way to deal with the pause settings.  There is support in some
> > Marvell DSA switches to force flow control but that's not supported
> > through the current mid-layer at all (port_set_pause doesn't do it.)
> > I'm not sure whether the "mv88e6xxx_phy_is_internal()" check there is
> > the right test for every DSA switch correct either.
> > 
> > What is missing is reading the results from the PCS (aka serdes) and
> > forwarding them into phylink - I did have a quick look at how that might
> > be possible, but the DSA code structure (consisting of multiple
> > mid-layers) makes it hard without rewriting quite a lot of code.  That's
> > fine if you know all the DSA chips inside out, but I don't - and that's
> > where we need someone who has the knowledge of all DSA switches that we
> > support.  Or, we get rid of the multiple mid-layers and switch to a
> > library approach, so that we can modify support for one DSA switch
> > without affecting everything.  It may be a simple matter of dropping the
> > existing serdes workaround, but I'm not sure at the moment.
> > 
> > I've tried this code out on the ZII rev B, I haven't tried it on the rev
> > C which has the 6390 switches yet.
> 
> Well, it seems GPIO hogging with the sx1503q (for the 3310 PHY, which
> is a local change) has broken sometime between v4.20 and v5.5, which
> prevents the sx1503q driver probing:
> 
> [   23.378706] gpio gpiochip7: (sx1503q): setup of own GPIO 10g-rstn failed
> [   23.394858] requesting hog GPIO 10g-rstn (chip sx1503q, offset 9) failed, -517
> [   23.402512] gpiochip_add_data_with_key: GPIOs 480..495 (sx1503q) failed to register, -517
> 
> Without the hog, the 3310 PHY doesn't come out of reset, so I lose
> port 9 on the first switch.
> 
> With that removed, I can boot, and if I bring up sff2, I see the port 9
> on the second switch status report 0xef4b and control 0x303f without
> fiber connected.  I'm out of time to do anything further on this today
> (not even decode those), because its taken all morning to get the board
> to this point, and I won't have any time tomorrow either.

Okay, I have a solution for the serdes ports on the mv88e6390 family
of switches (I hope all serdes blocks are the same across those)
tested on the ZII rev C - as expected, that requires no further
changes to phylink beyond what I've already stated in these threads.

It's a bit hacky at the moment because of all the layering that's
going on in DSA and mv88e6xxx - which will become worse if we split
the current phylink_mac_ops into separate MAC and PCS operations -
giving a number of extra problems.

I've pulled the patches into my "zii" branch - no idea if that will
build, I've a fair number of experimental phylink patches that I've
been working on for the split PCS/MAC issue that aren't a part of
that series.  I may need to shuffle some patches around...

Much of the base of the "phy" branch base is what was submitted for
net-next - "net: phy: provide phy driver start/stop hooks" marks the
boundary between stuff in net-next (before that commit) and stuff
which isn't.  I was planning to submit "net: linkmode: make
linkmode_test_bit() take const pointer" through "net: phylink: clarify
flow control settings in documentation" for this merge window, but it
wasn't tested enough to make it in time.

There's also a fix for DSA buried in there; DSA fails to call
phylink_stop() before tearing down phylink for DSA and CPU ports
"net: dsa: fix phylink_start()/phylink_stop() calls".  I'm not sure
that's the best solution yet, but I just hacked something up so that
unloading the mv88e6xxx module could be done reliably.

While working on this, I didn't notice where the behaviour you
described wrt serdes was coming from, so it'll be interesting to see
whether the issue still exists.  It may be wise if you enable phylink
debugging to grab what's going on, particularly with the mac_config()
calls before trying out any of the above patches.

For others, the Clause 37 patch and a few others have changed a bit
since I posted it as a result of working on these issues.  All this is
very much "up in the air" still.

I was planning to spend more time on this today, but unfortunately
other issues have got in the way, so I've pushed the stuff out, and
see what 0-day finds - I may be able to do a bit more later today.
diff mbox series

Patch

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 4174d874b1f7..75dbaf80d7a5 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -533,10 +533,20 @@  static void phylink_resolve(struct work_struct *w)
 
 	if (link_changed) {
 		pl->old_link_state = link_state.link;
-		if (!link_state.link)
+		if (!link_state.link) {
 			phylink_mac_link_down(pl);
-		else
+		} else {
+			/* If no PHY is connected, we still need to configure
+			 * MAC/PCS for flow control and speed.
+			 */
+			if (!pl->phydev) {
+				phylink_resolve_flow(pl, &link_state);
+				phylink_mac_config(pl, &link_state);
+			}
+
 			phylink_mac_link_up(pl, link_state);
+		}
+
 	}
 	if (!link_state.link && pl->mac_link_dropped) {
 		pl->mac_link_dropped = false;