Message ID | 20191217221831.10923-1-olteanv@gmail.com |
---|---|
Headers | show |
Series | Convert Felix DSA switch to PHYLINK | expand |
On Wed, Dec 18, 2019 at 12:18:23AM +0200, Vladimir Oltean wrote: > The SerDes protocol which the driver calls 2500Base-X mode (a misnomer) is more > interesting. There is a description of how it works and what can be done with > it in patch 8/8 (in a comment above vsc9959_pcs_init_2500basex). > In short, it is a fixed speed protocol with no auto-negotiation whatsoever. > From my research of the SGMII-2500 patent [1], it has nothing to do with > SGMII-2500. That one: > * does not define any change to the AN base page compared to plain 10/100/1000 > SGMII. This implies that the 2500 speed is not negotiable, but the other > speeds are. In our case, when the SerDes is configured for this protocol it's > configured for good, there's no going back to SGMII. > * runs at a higher base frequency than regular SGMII. So SGMII-2500 operating > at 1000 Mbps wouldn't interoperate with plain SGMII at 1000 Mbps. Strange, > but ok.. > * Emulates lower link speeds than 2500 by duplicating the codewords twice, then > thrice, then twice again etc (2.5/25/250 times on average). The Layerscape > PCS doesn't do that (it is fixed at 2500 Mbaud). > > But on the other hand it isn't Base-X either, since it doesn't do 802.3z / > clause 37 auto negotiation (flow control, local/remote fault etc). Most documentation I've seen for these 2.5Gbps modes from a few vendors suggests that every vendor has its own ideas. With Marvell, for example, whether the gigabit MAC is operating at 1G or 2.5G is controlled by the serdes "comphy" block, and from what I can tell merely increases the clocking rate by 2.5x when 2.5G mode is selected. I suspect all the features of conventional 1G mode are also available when running at this higher speed, but many don't make sense. For example, if the MAC (there's no distinction of the PCS there) was configured for 100Mbps, but the comphy was configured for 2.5G, we'd end up with 250Mbps by 10x replication on the link. This has never been tested though. Whether in-band AN is enabled or not is separately configurable, although the Marvell Armada 370 documentation states that when the port is configured for 1000Base-X, in-band AN must be enabled. I suspect the same is stated for later devices. However, practical testing seems to suggest that it will work without in-band AN enabled if the other side doesn't want in-band AN. > So it is a protocol in its own right (a rather fixed one). If reviewers think > it should have its own phy-mode, different than 2500base-x, I'm in favor of > that. It's just that I couldn't find a suitable name for it. quarter-xaui? > 3125mhz-8b10b? I think just call it 2500base-x without in-band negotiation. > When inter-operating with the Aquantia AQR112 and AQR412 PHYs in this mode (for > which the PHY uses a proprietary name "OCSGMII"), we do still need to support > the lower link speeds negotiated by the PHY on copper side. So what we > typically do is we enable rate adaptation in the PHY firmware, with pause > frames and a fixed link speed on the system side. Raising this as a discussion > item to understand how we can model this quirky operating mode in Linux without > imposing limitations on others (we have some downstream patches on the Aquantia > PHY driver as well). > > Another item to discuss is whether we should be able to disable AN in the PCS > independently of whether we have a PHY as a peer or not. With an SGMII PHY-less > connection, there may be no auto-negotiation master so the link partners will > bounce for a while and then they'll settle on 10 Mbps as link speed. For those > connections (such as an SGMII link to a downstream switch) we need to disable > AN in the PCS and force the speed to 1000. With Marvell, there's the in-band AN enable bit, but there's also an in-band AN bypass bit, which allows the in-band AN to time out when the 16-bit configuration word is not received after a certain period of time after the link has been established. As for disabling AN... see below. > So: > * If we have a PHY we want to do auto-neg > * If we don't have a PHY, maybe we want AN, maybe we don't > * In the 2500Base-X mode, we can't do AN because the hardware doesn't support it > > I have found the 'managed = "in-band-status"' DTS binding to somewhat address > this concern, but I am a bit reluctant to disable SGMII AN if it isn't set. I believe the in-band-status thing came from mvneta prior to phylink. mvneta supports operating in SGMII with no in-band AN, and the default setup that the driver adopted in SGMII mode was without in-band AN enabled. When in-band AN was required, that's when this DT property came about. mvneta was the basis for phylink creation (as it was the first platform I had that we needed to support SFPs). Compatibility with mvneta had to be maintained to avoid regressions, so that got built-in to phylink. When we are not operating in in-band AN mode, then yes, we do disable in-band AN at the MAC for mvneta and mvpp2. What others do, I haven't delved into. However, it is important - I have a SFP that uses a Broadcom PHY that does not provide any in-band AN when in SGMII mode. It is SGMII mode, because it supports the symbol replication for 100 and 10Mbps - but Broadcom's expectation is that the MAC is forced to the appropriate speed after reading the PHY registers. I think the reason it does this is because it's a NBASE-T PHY, and you have to read the PHY registers to know what protocol its using on the serdes lane, which could be one of 10GBASE-R, 5000BASE-X, 2500BASE-X or SGMII at 1G, 100M or 10M depending on the copper side negotiation results. This works with mvneta and mvpp2 since, when operating in fixed or phy mode in phylink, we disable in-band AN and force the MAC settings. Keeping this consistent across drivers (where possible) would be my preference to avoid different behaviours and surprises. > We have boards with device trees that need to be migrated from PHYLIB and I > am concerned that changing the behavior w.r.t. in-band AN (when the > "in-band-status" property is not present) is not going to be pleasant. > I do understand that the "in-band-status" property is primarily intended to be > used with PHY-less setups, and it looks like the fact it does work with PHYs as > well is more of an oversight (as patch 2/8 shows). So I'm not sure who else > really uses it with a phy-handle. That partly depends on the PHY. Given what I've said above, some PHYs require in-band AN to complete before they will pass data, other MACs will not establish a link if in-band AN is enabled but there is no in-band control word, even if the bypass bit is enabled. Using in-band AN with a PHY results in faster link establishment, and also has the advantage that when the link goes down, the MAC responds a lot quicker than the 1sec phylib poll to stop transmitting. I tend to run my boards with in-band AN enabled even for on-board PHYs although I've not pushed those patches upstream. Most PHYs on these boards come up with in-band AN bypass enabled, so it doesn't matter whether in-band AN is enabled or disabled.
Hi Russell, On Wed, 18 Dec 2019 at 12:40, Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote: > > On Wed, Dec 18, 2019 at 12:18:23AM +0200, Vladimir Oltean wrote: > > The SerDes protocol which the driver calls 2500Base-X mode (a misnomer) is more > > interesting. There is a description of how it works and what can be done with > > it in patch 8/8 (in a comment above vsc9959_pcs_init_2500basex). > > In short, it is a fixed speed protocol with no auto-negotiation whatsoever. > > From my research of the SGMII-2500 patent [1], it has nothing to do with > > SGMII-2500. That one: > > * does not define any change to the AN base page compared to plain 10/100/1000 > > SGMII. This implies that the 2500 speed is not negotiable, but the other > > speeds are. In our case, when the SerDes is configured for this protocol it's > > configured for good, there's no going back to SGMII. > > * runs at a higher base frequency than regular SGMII. So SGMII-2500 operating > > at 1000 Mbps wouldn't interoperate with plain SGMII at 1000 Mbps. Strange, > > but ok.. > > * Emulates lower link speeds than 2500 by duplicating the codewords twice, then > > thrice, then twice again etc (2.5/25/250 times on average). The Layerscape > > PCS doesn't do that (it is fixed at 2500 Mbaud). > > > > But on the other hand it isn't Base-X either, since it doesn't do 802.3z / > > clause 37 auto negotiation (flow control, local/remote fault etc). > > Most documentation I've seen for these 2.5Gbps modes from a few vendors > suggests that every vendor has its own ideas. > > With Marvell, for example, whether the gigabit MAC is operating at 1G > or 2.5G is controlled by the serdes "comphy" block, and from what I can > tell merely increases the clocking rate by 2.5x when 2.5G mode is > selected. > > I suspect all the features of conventional 1G mode are also available > when running at this higher speed, but many don't make sense. For > example, if the MAC (there's no distinction of the PCS there) was > configured for 100Mbps, but the comphy was configured for 2.5G, we'd > end up with 250Mbps by 10x replication on the link. This has never been > tested though. > > Whether in-band AN is enabled or not is separately configurable, > although the Marvell Armada 370 documentation states that when the port > is configured for 1000Base-X, in-band AN must be enabled. I suspect the > same is stated for later devices. However, practical testing seems to > suggest that it will work without in-band AN enabled if the other side > doesn't want in-band AN. > > > So it is a protocol in its own right (a rather fixed one). If reviewers think > > it should have its own phy-mode, different than 2500base-x, I'm in favor of > > that. It's just that I couldn't find a suitable name for it. quarter-xaui? > > 3125mhz-8b10b? > > I think just call it 2500base-x without in-band negotiation. > Ok, I already have the check in vsc9959_pcs_init_2500basex for link_an_mode == MLO_AN_INBAND, so I guess that should be fine. > > When inter-operating with the Aquantia AQR112 and AQR412 PHYs in this mode (for > > which the PHY uses a proprietary name "OCSGMII"), we do still need to support > > the lower link speeds negotiated by the PHY on copper side. So what we > > typically do is we enable rate adaptation in the PHY firmware, with pause > > frames and a fixed link speed on the system side. Raising this as a discussion > > item to understand how we can model this quirky operating mode in Linux without > > imposing limitations on others (we have some downstream patches on the Aquantia > > PHY driver as well). > > > > Another item to discuss is whether we should be able to disable AN in the PCS > > independently of whether we have a PHY as a peer or not. With an SGMII PHY-less > > connection, there may be no auto-negotiation master so the link partners will > > bounce for a while and then they'll settle on 10 Mbps as link speed. For those > > connections (such as an SGMII link to a downstream switch) we need to disable > > AN in the PCS and force the speed to 1000. > > With Marvell, there's the in-band AN enable bit, but there's also an > in-band AN bypass bit, which allows the in-band AN to time out when > the 16-bit configuration word is not received after a certain period > of time after the link has been established. > > As for disabling AN... see below. > > > So: > > * If we have a PHY we want to do auto-neg > > * If we don't have a PHY, maybe we want AN, maybe we don't > > * In the 2500Base-X mode, we can't do AN because the hardware doesn't support it > > > > I have found the 'managed = "in-band-status"' DTS binding to somewhat address > > this concern, but I am a bit reluctant to disable SGMII AN if it isn't set. > > I believe the in-band-status thing came from mvneta prior to phylink. > mvneta supports operating in SGMII with no in-band AN, and the default > setup that the driver adopted in SGMII mode was without in-band AN > enabled. When in-band AN was required, that's when this DT property > came about. > > mvneta was the basis for phylink creation (as it was the first platform > I had that we needed to support SFPs). Compatibility with mvneta had to > be maintained to avoid regressions, so that got built-in to phylink. > > When we are not operating in in-band AN mode, then yes, we do disable > in-band AN at the MAC for mvneta and mvpp2. What others do, I haven't > delved into. However, it is important - I have a SFP that uses a > Broadcom PHY that does not provide any in-band AN when in SGMII mode. > It is SGMII mode, because it supports the symbol replication for 100 > and 10Mbps - but Broadcom's expectation is that the MAC is forced to > the appropriate speed after reading the PHY registers. > > I think the reason it does this is because it's a NBASE-T PHY, and you > have to read the PHY registers to know what protocol its using on the > serdes lane, which could be one of 10GBASE-R, 5000BASE-X, 2500BASE-X > or SGMII at 1G, 100M or 10M depending on the copper side negotiation > results. > > This works with mvneta and mvpp2 since, when operating in fixed or > phy mode in phylink, we disable in-band AN and force the MAC > settings. > > Keeping this consistent across drivers (where possible) would be my > preference to avoid different behaviours and surprises. > > > We have boards with device trees that need to be migrated from PHYLIB and I > > am concerned that changing the behavior w.r.t. in-band AN (when the > > "in-band-status" property is not present) is not going to be pleasant. > > I do understand that the "in-band-status" property is primarily intended to be > > used with PHY-less setups, and it looks like the fact it does work with PHYs as > > well is more of an oversight (as patch 2/8 shows). So I'm not sure who else > > really uses it with a phy-handle. > > That partly depends on the PHY. Given what I've said above, some PHYs > require in-band AN to complete before they will pass data, other MACs > will not establish a link if in-band AN is enabled but there is no > in-band control word, even if the bypass bit is enabled. > I think the Layerscape PCS falls to 10 Mbps if AN is enabled but no config word is received. At least that's how it behaves when it is put in loopback mode. Naturally traffic doesn't pass through the PHY if that has negotiated a gigabit link on copper side. > Using in-band AN with a PHY results in faster link establishment, and > also has the advantage that when the link goes down, the MAC responds > a lot quicker than the 1sec phylib poll to stop transmitting. > > I tend to run my boards with in-band AN enabled even for on-board PHYs > although I've not pushed those patches upstream. Most PHYs on these > boards come up with in-band AN bypass enabled, so it doesn't matter > whether in-band AN is enabled or disabled. > Ok, makes some sense. We discussed a little bit internally and the conclusion, as Alex put it, is that it takes two to AN. Perhaps I would be more confident about this if the PHY driver had awareness of the PHYLINK link_an_mode, in order to coordinate its settings with what the PCS expects, and vice-versa. On the LS1028A-RDB board we have 2 issues related to SGMII AN at the moment: - The at803x.c driver explicitly checks for the ACK from the MAC PCS, and prints "SGMII link is not ok" otherwise, and refuses to bring the link up. This hurts us in 4.19 because I think the check is a bit misplaced in the .aneg_done callback. To be precise, what we observe is that this function is not called by the state machine a second, third time etc to recheck if the AN has completed in the meantime. In current net-next, as far as I could figure out, at803x_aneg_done is dead code. What is ironic about the commit f62265b53ef3 ("at803x: double check SGMII side autoneg") that introduced this function is that it's for the gianfar driver (Freescale eTSEC), a MAC that has never supported reprogramming itself based on the in-band config word. In fact, if you look at gfar_configure_serdes, it even configures its register 0x4 with an advertisement for 1000Base-X, not SGMII (0x4001). So I really wonder if there is any real purpose to this check in at803x_aneg_done, and if not, I would respectfully remove it. - The vsc8514 PHY driver configures SerDes AN in U-Boot, but not in Linux. So we observe that if we disable PHY configuration in U-Boot, in-band AN breaks in Linux. We are actually wondering how we should fix this: from what you wrote above, it seems ok to hardcode SGMII AN in the PHY driver, and just ignore it in the PCS if managed = "in-band-status" is not set with PHYLINK. But as you said, in the general case maybe not all PHYs work until they haven't received the ACK from the MAC PCS, which makes this insufficient as a general solution. But the 2 cases above illustrate the lack of consistency among PHY drivers w.r.t. in-band aneg. > -- > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ > FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up > According to speedtest.net: 11.9Mbps down 500kbps up Thanks, -Vladimir
On Wed, Dec 18, 2019 at 03:21:02PM +0200, Vladimir Oltean wrote: > - The at803x.c driver explicitly checks for the ACK from the MAC PCS, > and prints "SGMII link is not ok" otherwise, and refuses to bring the > link up. This hurts us in 4.19 because I think the check is a bit > misplaced in the .aneg_done callback. To be precise, what we observe > is that this function is not called by the state machine a second, > third time etc to recheck if the AN has completed in the meantime. In > current net-next, as far as I could figure out, at803x_aneg_done is > dead code. What is ironic about the commit f62265b53ef3 ("at803x: > double check SGMII side autoneg") that introduced this function is > that it's for the gianfar driver (Freescale eTSEC), a MAC that has > never supported reprogramming itself based on the in-band config word. > In fact, if you look at gfar_configure_serdes, it even configures its > register 0x4 with an advertisement for 1000Base-X, not SGMII (0x4001). > So I really wonder if there is any real purpose to this check in > at803x_aneg_done, and if not, I would respectfully remove it. Please check whether at803x will pass data if the SGMII config exchange has not completed - I'm aware of some PHYs that, although link comes up on the copper side, if AN does not complete on the SGMII side, they will not pass data, even if the MAC side is forced up. I don't see any configuration bits in the 8031 that suggest the SGMII config exchange can be bypassed. > - The vsc8514 PHY driver configures SerDes AN in U-Boot, but not in > Linux. So we observe that if we disable PHY configuration in U-Boot, > in-band AN breaks in Linux. We are actually wondering how we should > fix this: from what you wrote above, it seems ok to hardcode SGMII AN > in the PHY driver, and just ignore it in the PCS if managed = > "in-band-status" is not set with PHYLINK. But as you said, in the > general case maybe not all PHYs work until they haven't received the > ACK from the MAC PCS, which makes this insufficient as a general > solution. > > But the 2 cases above illustrate the lack of consistency among PHY > drivers w.r.t. in-band aneg. Indeed - it's something of a mine field at the moment, because we aren't quite sure whether "SGMII" means that the PHY requires in-band AN or doesn't provide it. For the Broadcom case I mentioned, when it's used on a SFP, I've had to add a quirk to phylink to work around it. The problem is, it's not a case that the MAC can demand that the PHY provides in-band config - some PHYs are incapable of doing so. Whatever solution we come up with needs to be a "negotiation" between the PHY driver and the MAC driver for it to work well in the known scenarios - like the case with the Broadcom PHY on a SFP that can be plugged into any SFP supporting network interface...
On 12/18/2019 2:29 PM, Russell King - ARM Linux admin wrote: > On Wed, Dec 18, 2019 at 03:21:02PM +0200, Vladimir Oltean wrote: >> - The at803x.c driver explicitly checks for the ACK from the MAC PCS, >> and prints "SGMII link is not ok" otherwise, and refuses to bring the >> link up. This hurts us in 4.19 because I think the check is a bit >> misplaced in the .aneg_done callback. To be precise, what we observe >> is that this function is not called by the state machine a second, >> third time etc to recheck if the AN has completed in the meantime. In >> current net-next, as far as I could figure out, at803x_aneg_done is >> dead code. What is ironic about the commit f62265b53ef3 ("at803x: >> double check SGMII side autoneg") that introduced this function is >> that it's for the gianfar driver (Freescale eTSEC), a MAC that has >> never supported reprogramming itself based on the in-band config word. >> In fact, if you look at gfar_configure_serdes, it even configures its >> register 0x4 with an advertisement for 1000Base-X, not SGMII (0x4001). >> So I really wonder if there is any real purpose to this check in >> at803x_aneg_done, and if not, I would respectfully remove it. > > Please check whether at803x will pass data if the SGMII config exchange > has not completed - I'm aware of some PHYs that, although link comes up > on the copper side, if AN does not complete on the SGMII side, they > will not pass data, even if the MAC side is forced up. > > I don't see any configuration bits in the 8031 that suggest the SGMII > config exchange can be bypassed. > >> - The vsc8514 PHY driver configures SerDes AN in U-Boot, but not in >> Linux. So we observe that if we disable PHY configuration in U-Boot, >> in-band AN breaks in Linux. We are actually wondering how we should >> fix this: from what you wrote above, it seems ok to hardcode SGMII AN >> in the PHY driver, and just ignore it in the PCS if managed = >> "in-band-status" is not set with PHYLINK. But as you said, in the >> general case maybe not all PHYs work until they haven't received the >> ACK from the MAC PCS, which makes this insufficient as a general >> solution. >> >> But the 2 cases above illustrate the lack of consistency among PHY >> drivers w.r.t. in-band aneg. > > Indeed - it's something of a mine field at the moment, because we aren't > quite sure whether "SGMII" means that the PHY requires in-band AN or > doesn't provide it. For the Broadcom case I mentioned, when it's used on > a SFP, I've had to add a quirk to phylink to work around it. > > The problem is, it's not a case that the MAC can demand that the PHY > provides in-band config - some PHYs are incapable of doing so. Whatever > solution we come up with needs to be a "negotiation" between the PHY > driver and the MAC driver for it to work well in the known scenarios - > like the case with the Broadcom PHY on a SFP that can be plugged into > any SFP supporting network interface... Some sort of capability negotiation does seem to be the proper solution. We can have a new capabilities field in phydev for system interface capabilities and match that with MAC capabilities, configuration, factor in the quirks. The result would tell if a solution is possible, especially with quirky PHYs, and if PHY drivers need to enable AN. Until we have that in place, any recommended approach for PHY drivers, is it acceptable to hardcode system side AN on as a short term fix? I've just tested VSC8514 and it doesn't allow traffic through if SI AN is enabled but does not complete. We do use it with AN on on NXP systems, and it only works because U-Boot sets things up that way, but relying on U-Boot isn't great. Aquantia PHYs we use also require AN to complete if enabled. For them Linux depends on U-Boot or on PHY firmware to enable AN. I don't know if anyone out there uses these PHYs with AN off. Would a patch that hardcodes AN on for any of these PHYs be acceptable? Thanks! Alex
On Wed, Dec 18, 2019 at 03:00:41PM +0000, Alexandru Marginean wrote: > On 12/18/2019 2:29 PM, Russell King - ARM Linux admin wrote: > > On Wed, Dec 18, 2019 at 03:21:02PM +0200, Vladimir Oltean wrote: > >> - The at803x.c driver explicitly checks for the ACK from the MAC PCS, > >> and prints "SGMII link is not ok" otherwise, and refuses to bring the > >> link up. This hurts us in 4.19 because I think the check is a bit > >> misplaced in the .aneg_done callback. To be precise, what we observe > >> is that this function is not called by the state machine a second, > >> third time etc to recheck if the AN has completed in the meantime. In > >> current net-next, as far as I could figure out, at803x_aneg_done is > >> dead code. What is ironic about the commit f62265b53ef3 ("at803x: > >> double check SGMII side autoneg") that introduced this function is > >> that it's for the gianfar driver (Freescale eTSEC), a MAC that has > >> never supported reprogramming itself based on the in-band config word. > >> In fact, if you look at gfar_configure_serdes, it even configures its > >> register 0x4 with an advertisement for 1000Base-X, not SGMII (0x4001). > >> So I really wonder if there is any real purpose to this check in > >> at803x_aneg_done, and if not, I would respectfully remove it. > > > > Please check whether at803x will pass data if the SGMII config exchange > > has not completed - I'm aware of some PHYs that, although link comes up > > on the copper side, if AN does not complete on the SGMII side, they > > will not pass data, even if the MAC side is forced up. > > > > I don't see any configuration bits in the 8031 that suggest the SGMII > > config exchange can be bypassed. > > > >> - The vsc8514 PHY driver configures SerDes AN in U-Boot, but not in > >> Linux. So we observe that if we disable PHY configuration in U-Boot, > >> in-band AN breaks in Linux. We are actually wondering how we should > >> fix this: from what you wrote above, it seems ok to hardcode SGMII AN > >> in the PHY driver, and just ignore it in the PCS if managed = > >> "in-band-status" is not set with PHYLINK. But as you said, in the > >> general case maybe not all PHYs work until they haven't received the > >> ACK from the MAC PCS, which makes this insufficient as a general > >> solution. > >> > >> But the 2 cases above illustrate the lack of consistency among PHY > >> drivers w.r.t. in-band aneg. > > > > Indeed - it's something of a mine field at the moment, because we aren't > > quite sure whether "SGMII" means that the PHY requires in-band AN or > > doesn't provide it. For the Broadcom case I mentioned, when it's used on > > a SFP, I've had to add a quirk to phylink to work around it. > > > > The problem is, it's not a case that the MAC can demand that the PHY > > provides in-band config - some PHYs are incapable of doing so. Whatever > > solution we come up with needs to be a "negotiation" between the PHY > > driver and the MAC driver for it to work well in the known scenarios - > > like the case with the Broadcom PHY on a SFP that can be plugged into > > any SFP supporting network interface... > > Some sort of capability negotiation does seem to be the proper solution. > We can have a new capabilities field in phydev for system interface > capabilities and match that with MAC capabilities, configuration, factor > in the quirks. The result would tell if a solution is possible, > especially with quirky PHYs, and if PHY drivers need to enable AN. > > Until we have that in place, any recommended approach for PHY drivers, > is it acceptable to hardcode system side AN on as a short term fix? > I've just tested VSC8514 and it doesn't allow traffic through if SI AN > is enabled but does not complete. We do use it with AN on on NXP > systems, and it only works because U-Boot sets things up that way, but > relying on U-Boot isn't great. > Aquantia PHYs we use also require AN to complete if enabled. For them > Linux depends on U-Boot or on PHY firmware to enable AN. I don't know > if anyone out there uses these PHYs with AN off. Would a patch that > hardcodes AN on for any of these PHYs be acceptable? I'm not sure why you're talking about hard-coding anything. As I've already mentioned, phylink allows you to specify today whether you want to use in-band AN or not, provided the MAC implements it as is done with mvneta and mvpp2.
On 12/18/2019 6:22 PM, Russell King - ARM Linux admin wrote: > On Wed, Dec 18, 2019 at 03:00:41PM +0000, Alexandru Marginean wrote: >> On 12/18/2019 2:29 PM, Russell King - ARM Linux admin wrote: >>> On Wed, Dec 18, 2019 at 03:21:02PM +0200, Vladimir Oltean wrote: >>>> - The at803x.c driver explicitly checks for the ACK from the MAC PCS, >>>> and prints "SGMII link is not ok" otherwise, and refuses to bring the >>>> link up. This hurts us in 4.19 because I think the check is a bit >>>> misplaced in the .aneg_done callback. To be precise, what we observe >>>> is that this function is not called by the state machine a second, >>>> third time etc to recheck if the AN has completed in the meantime. In >>>> current net-next, as far as I could figure out, at803x_aneg_done is >>>> dead code. What is ironic about the commit f62265b53ef3 ("at803x: >>>> double check SGMII side autoneg") that introduced this function is >>>> that it's for the gianfar driver (Freescale eTSEC), a MAC that has >>>> never supported reprogramming itself based on the in-band config word. >>>> In fact, if you look at gfar_configure_serdes, it even configures its >>>> register 0x4 with an advertisement for 1000Base-X, not SGMII (0x4001). >>>> So I really wonder if there is any real purpose to this check in >>>> at803x_aneg_done, and if not, I would respectfully remove it. >>> >>> Please check whether at803x will pass data if the SGMII config exchange >>> has not completed - I'm aware of some PHYs that, although link comes up >>> on the copper side, if AN does not complete on the SGMII side, they >>> will not pass data, even if the MAC side is forced up. >>> >>> I don't see any configuration bits in the 8031 that suggest the SGMII >>> config exchange can be bypassed. >>> >>>> - The vsc8514 PHY driver configures SerDes AN in U-Boot, but not in >>>> Linux. So we observe that if we disable PHY configuration in U-Boot, >>>> in-band AN breaks in Linux. We are actually wondering how we should >>>> fix this: from what you wrote above, it seems ok to hardcode SGMII AN >>>> in the PHY driver, and just ignore it in the PCS if managed = >>>> "in-band-status" is not set with PHYLINK. But as you said, in the >>>> general case maybe not all PHYs work until they haven't received the >>>> ACK from the MAC PCS, which makes this insufficient as a general >>>> solution. >>>> >>>> But the 2 cases above illustrate the lack of consistency among PHY >>>> drivers w.r.t. in-band aneg. >>> >>> Indeed - it's something of a mine field at the moment, because we aren't >>> quite sure whether "SGMII" means that the PHY requires in-band AN or >>> doesn't provide it. For the Broadcom case I mentioned, when it's used on >>> a SFP, I've had to add a quirk to phylink to work around it. >>> >>> The problem is, it's not a case that the MAC can demand that the PHY >>> provides in-band config - some PHYs are incapable of doing so. Whatever >>> solution we come up with needs to be a "negotiation" between the PHY >>> driver and the MAC driver for it to work well in the known scenarios - >>> like the case with the Broadcom PHY on a SFP that can be plugged into >>> any SFP supporting network interface... >> >> Some sort of capability negotiation does seem to be the proper solution. >> We can have a new capabilities field in phydev for system interface >> capabilities and match that with MAC capabilities, configuration, factor >> in the quirks. The result would tell if a solution is possible, >> especially with quirky PHYs, and if PHY drivers need to enable AN. >> >> Until we have that in place, any recommended approach for PHY drivers, >> is it acceptable to hardcode system side AN on as a short term fix? >> I've just tested VSC8514 and it doesn't allow traffic through if SI AN >> is enabled but does not complete. We do use it with AN on on NXP >> systems, and it only works because U-Boot sets things up that way, but >> relying on U-Boot isn't great. >> Aquantia PHYs we use also require AN to complete if enabled. For them >> Linux depends on U-Boot or on PHY firmware to enable AN. I don't know >> if anyone out there uses these PHYs with AN off. Would a patch that >> hardcodes AN on for any of these PHYs be acceptable? > > I'm not sure why you're talking about hard-coding anything. As I've > already mentioned, phylink allows you to specify today whether you > want to use in-band AN or not, provided the MAC implements it as is > done with mvneta and mvpp2. I was asking about PHY drivers, not MAC, in the meantime I noticed phy_device carries a pointer to phylink. I assume it's OK if PHY drivers check link_an_mode and set up PHY system interface based on it. Thanks! Alex
On Wed, Dec 18, 2019 at 08:15:59PM +0000, Alexandru Marginean wrote: > On 12/18/2019 6:22 PM, Russell King - ARM Linux admin wrote: > > On Wed, Dec 18, 2019 at 03:00:41PM +0000, Alexandru Marginean wrote: > >> On 12/18/2019 2:29 PM, Russell King - ARM Linux admin wrote: > >>> On Wed, Dec 18, 2019 at 03:21:02PM +0200, Vladimir Oltean wrote: > >>>> - The at803x.c driver explicitly checks for the ACK from the MAC PCS, > >>>> and prints "SGMII link is not ok" otherwise, and refuses to bring the > >>>> link up. This hurts us in 4.19 because I think the check is a bit > >>>> misplaced in the .aneg_done callback. To be precise, what we observe > >>>> is that this function is not called by the state machine a second, > >>>> third time etc to recheck if the AN has completed in the meantime. In > >>>> current net-next, as far as I could figure out, at803x_aneg_done is > >>>> dead code. What is ironic about the commit f62265b53ef3 ("at803x: > >>>> double check SGMII side autoneg") that introduced this function is > >>>> that it's for the gianfar driver (Freescale eTSEC), a MAC that has > >>>> never supported reprogramming itself based on the in-band config word. > >>>> In fact, if you look at gfar_configure_serdes, it even configures its > >>>> register 0x4 with an advertisement for 1000Base-X, not SGMII (0x4001). > >>>> So I really wonder if there is any real purpose to this check in > >>>> at803x_aneg_done, and if not, I would respectfully remove it. > >>> > >>> Please check whether at803x will pass data if the SGMII config exchange > >>> has not completed - I'm aware of some PHYs that, although link comes up > >>> on the copper side, if AN does not complete on the SGMII side, they > >>> will not pass data, even if the MAC side is forced up. > >>> > >>> I don't see any configuration bits in the 8031 that suggest the SGMII > >>> config exchange can be bypassed. > >>> > >>>> - The vsc8514 PHY driver configures SerDes AN in U-Boot, but not in > >>>> Linux. So we observe that if we disable PHY configuration in U-Boot, > >>>> in-band AN breaks in Linux. We are actually wondering how we should > >>>> fix this: from what you wrote above, it seems ok to hardcode SGMII AN > >>>> in the PHY driver, and just ignore it in the PCS if managed = > >>>> "in-band-status" is not set with PHYLINK. But as you said, in the > >>>> general case maybe not all PHYs work until they haven't received the > >>>> ACK from the MAC PCS, which makes this insufficient as a general > >>>> solution. > >>>> > >>>> But the 2 cases above illustrate the lack of consistency among PHY > >>>> drivers w.r.t. in-band aneg. > >>> > >>> Indeed - it's something of a mine field at the moment, because we aren't > >>> quite sure whether "SGMII" means that the PHY requires in-band AN or > >>> doesn't provide it. For the Broadcom case I mentioned, when it's used on > >>> a SFP, I've had to add a quirk to phylink to work around it. > >>> > >>> The problem is, it's not a case that the MAC can demand that the PHY > >>> provides in-band config - some PHYs are incapable of doing so. Whatever > >>> solution we come up with needs to be a "negotiation" between the PHY > >>> driver and the MAC driver for it to work well in the known scenarios - > >>> like the case with the Broadcom PHY on a SFP that can be plugged into > >>> any SFP supporting network interface... > >> > >> Some sort of capability negotiation does seem to be the proper solution. > >> We can have a new capabilities field in phydev for system interface > >> capabilities and match that with MAC capabilities, configuration, factor > >> in the quirks. The result would tell if a solution is possible, > >> especially with quirky PHYs, and if PHY drivers need to enable AN. > >> > >> Until we have that in place, any recommended approach for PHY drivers, > >> is it acceptable to hardcode system side AN on as a short term fix? > >> I've just tested VSC8514 and it doesn't allow traffic through if SI AN > >> is enabled but does not complete. We do use it with AN on on NXP > >> systems, and it only works because U-Boot sets things up that way, but > >> relying on U-Boot isn't great. > >> Aquantia PHYs we use also require AN to complete if enabled. For them > >> Linux depends on U-Boot or on PHY firmware to enable AN. I don't know > >> if anyone out there uses these PHYs with AN off. Would a patch that > >> hardcodes AN on for any of these PHYs be acceptable? > > > > I'm not sure why you're talking about hard-coding anything. As I've > > already mentioned, phylink allows you to specify today whether you > > want to use in-band AN or not, provided the MAC implements it as is > > done with mvneta and mvpp2. > > I was asking about PHY drivers, not MAC, in the meantime I noticed > phy_device carries a pointer to phylink. I assume it's OK if PHY > drivers check link_an_mode and set up PHY system interface based on it. Definitely not on several counts: 1. it's an opaque structure to everything but phylink itself (you may notice that it's defined in phylink.c, and that is why - it is unsafe to go poking about in it.) Lessons should've been learnt from phylib allowing MAC drivers to go poking about in phylib internal state (which I hear phylib maintainers regret allowing.) This is an intentional design choice on my part to ensure that we do not fall into the same trap. 2. phylink really needs to know what the PHY properties are before the driver goes poking about in phylink - the current phylink state at a particular point in time may not be the operating state ultimately chosen for the PHY. 3. phylib may not be used with phylink, and we need phylib to continue to work in phylink-less systems. (2) is the killer, because it's useless trying to look at the phylink state when the PHY is "attached" (and hence config_init is called.) To see why we need PHY property information earlier, look at phylink.c in net-next, specifically: 1. phylink_bringup_phy(), where we have a special case for Clause 45 PHYs that switch their interface modes. 2. phylink_phy_no_inband() and phylink_sfp_connect_phy() which special-cases the Broadcom PHY that provides no in-band AN. That is particularly relevant to my point (2) above. 3. phylink_sfp_connect_phy() needing to know the capabilities of the PHY before the PHY driver has been attached, so that we can select a particular interface mode that is suitable for the PHY to co-operate with the MAC.
On 12/19/2019 12:21 AM, Russell King - ARM Linux admin wrote: > On Wed, Dec 18, 2019 at 08:15:59PM +0000, Alexandru Marginean wrote: >> On 12/18/2019 6:22 PM, Russell King - ARM Linux admin wrote: >>> On Wed, Dec 18, 2019 at 03:00:41PM +0000, Alexandru Marginean wrote: >>>> On 12/18/2019 2:29 PM, Russell King - ARM Linux admin wrote: >>>>> On Wed, Dec 18, 2019 at 03:21:02PM +0200, Vladimir Oltean wrote: >>>>>> - The at803x.c driver explicitly checks for the ACK from the MAC PCS, >>>>>> and prints "SGMII link is not ok" otherwise, and refuses to bring the >>>>>> link up. This hurts us in 4.19 because I think the check is a bit >>>>>> misplaced in the .aneg_done callback. To be precise, what we observe >>>>>> is that this function is not called by the state machine a second, >>>>>> third time etc to recheck if the AN has completed in the meantime. In >>>>>> current net-next, as far as I could figure out, at803x_aneg_done is >>>>>> dead code. What is ironic about the commit f62265b53ef3 ("at803x: >>>>>> double check SGMII side autoneg") that introduced this function is >>>>>> that it's for the gianfar driver (Freescale eTSEC), a MAC that has >>>>>> never supported reprogramming itself based on the in-band config word. >>>>>> In fact, if you look at gfar_configure_serdes, it even configures its >>>>>> register 0x4 with an advertisement for 1000Base-X, not SGMII (0x4001). >>>>>> So I really wonder if there is any real purpose to this check in >>>>>> at803x_aneg_done, and if not, I would respectfully remove it. >>>>> >>>>> Please check whether at803x will pass data if the SGMII config exchange >>>>> has not completed - I'm aware of some PHYs that, although link comes up >>>>> on the copper side, if AN does not complete on the SGMII side, they >>>>> will not pass data, even if the MAC side is forced up. >>>>> >>>>> I don't see any configuration bits in the 8031 that suggest the SGMII >>>>> config exchange can be bypassed. >>>>> >>>>>> - The vsc8514 PHY driver configures SerDes AN in U-Boot, but not in >>>>>> Linux. So we observe that if we disable PHY configuration in U-Boot, >>>>>> in-band AN breaks in Linux. We are actually wondering how we should >>>>>> fix this: from what you wrote above, it seems ok to hardcode SGMII AN >>>>>> in the PHY driver, and just ignore it in the PCS if managed = >>>>>> "in-band-status" is not set with PHYLINK. But as you said, in the >>>>>> general case maybe not all PHYs work until they haven't received the >>>>>> ACK from the MAC PCS, which makes this insufficient as a general >>>>>> solution. >>>>>> >>>>>> But the 2 cases above illustrate the lack of consistency among PHY >>>>>> drivers w.r.t. in-band aneg. >>>>> >>>>> Indeed - it's something of a mine field at the moment, because we aren't >>>>> quite sure whether "SGMII" means that the PHY requires in-band AN or >>>>> doesn't provide it. For the Broadcom case I mentioned, when it's used on >>>>> a SFP, I've had to add a quirk to phylink to work around it. >>>>> >>>>> The problem is, it's not a case that the MAC can demand that the PHY >>>>> provides in-band config - some PHYs are incapable of doing so. Whatever >>>>> solution we come up with needs to be a "negotiation" between the PHY >>>>> driver and the MAC driver for it to work well in the known scenarios - >>>>> like the case with the Broadcom PHY on a SFP that can be plugged into >>>>> any SFP supporting network interface... >>>> >>>> Some sort of capability negotiation does seem to be the proper solution. >>>> We can have a new capabilities field in phydev for system interface >>>> capabilities and match that with MAC capabilities, configuration, factor >>>> in the quirks. The result would tell if a solution is possible, >>>> especially with quirky PHYs, and if PHY drivers need to enable AN. >>>> >>>> Until we have that in place, any recommended approach for PHY drivers, >>>> is it acceptable to hardcode system side AN on as a short term fix? >>>> I've just tested VSC8514 and it doesn't allow traffic through if SI AN >>>> is enabled but does not complete. We do use it with AN on on NXP >>>> systems, and it only works because U-Boot sets things up that way, but >>>> relying on U-Boot isn't great. >>>> Aquantia PHYs we use also require AN to complete if enabled. For them >>>> Linux depends on U-Boot or on PHY firmware to enable AN. I don't know >>>> if anyone out there uses these PHYs with AN off. Would a patch that >>>> hardcodes AN on for any of these PHYs be acceptable? >>> >>> I'm not sure why you're talking about hard-coding anything. As I've >>> already mentioned, phylink allows you to specify today whether you >>> want to use in-band AN or not, provided the MAC implements it as is >>> done with mvneta and mvpp2. >> >> I was asking about PHY drivers, not MAC, in the meantime I noticed >> phy_device carries a pointer to phylink. I assume it's OK if PHY >> drivers check link_an_mode and set up PHY system interface based on it. > > Definitely not on several counts: > > 1. it's an opaque structure to everything but phylink itself (you > may notice that it's defined in phylink.c, and that is why - it > is unsafe to go poking about in it.) Lessons should've been > learnt from phylib allowing MAC drivers to go poking about in > phylib internal state (which I hear phylib maintainers regret > allowing.) This is an intentional design choice on my part to > ensure that we do not fall into the same trap. > > 2. phylink really needs to know what the PHY properties are before > the driver goes poking about in phylink - the current phylink > state at a particular point in time may not be the operating state > ultimately chosen for the PHY. > > 3. phylib may not be used with phylink, and we need phylib to > continue to work in phylink-less systems. > > (2) is the killer, because it's useless trying to look at the phylink > state when the PHY is "attached" (and hence config_init is called.) My bad, I just assumed without checking that phylink is more like phylib than it actually is. (3) is probably less of a problem, phy drivers could check if phylink pointers is NULL, but it doesn't matter anyway, since the phylink structure is opaque. > To see why we need PHY property information earlier, look at > phylink.c in net-next, specifically: > > 1. phylink_bringup_phy(), where we have a special case for Clause 45 > PHYs that switch their interface modes. > > 2. phylink_phy_no_inband() and phylink_sfp_connect_phy() which > special-cases the Broadcom PHY that provides no in-band AN. > That is particularly relevant to my point (2) above. > > 3. phylink_sfp_connect_phy() needing to know the capabilities of > the PHY before the PHY driver has been attached, so that we can > select a particular interface mode that is suitable for the PHY > to co-operate with the MAC. > This is all fair and I'm not arguing against a proper solution that involves a capability exchange between PHY and MAC. Until we have that in place, or if phylib is used, PHY drivers should probably apply a recommended configuration in config_init based on information available to them at that time. For phy-mode = "sgmii" or "qsgmii" I think PHY drivers should enable AN on system side by default if that's an option in the PHY hardware. Currently some seem to do it, other don't. In that context vsc8514 config_init could hardcode enabling of AN, phy-mode wouldn't even matter as the PHY only supports qsgmii. This kind of change does not address the overall problem but at least makes PHYs more uniform and predictable. In the proper solution phylink could later determine what the PHY configuration should be based on MAC capabilities and configuration and re-apply it using a new op, potentially turning off system side AN in vsc8514 case. Does this look reasonable? Thanks! Alex
From: Vladimir Oltean <vladimir.oltean@nxp.com> Unlike most other conversions, and unlike this series' own v1 [0] from which it is radically different, this conversion is not by far a trivial one, and should be seen as "Layerscape PCS meets PHYLINK". Depending on the feedback received, the PCS driver from felix_vsc9959 can in fact be generalized to cover most of the Layerscape devices, or we can scrap the idea. Actually, the PCS doesn't need a lot of hand-holding and most of our other devices 'just work' (this one included) without any sort of operating system awareness, just an initialization procedure done typically in the bootloader. The PCS is not specific to the Vitesse / Microsemi / Microchip switching core at all. Variations of this SerDes/PCS design can also be found on DPAA1 and DPAA2 hardware. The main idea of the abstraction provided is that the PCS looks so much like a PHY device, that we model it as an actual PHY device and run the generic PHY functions on it, where appropriate. The 4xSGMII, QSGMII and QSXGMII modes are fairly straightforward. The SerDes protocol which the driver calls 2500Base-X mode (a misnomer) is more interesting. There is a description of how it works and what can be done with it in patch 8/8 (in a comment above vsc9959_pcs_init_2500basex). In short, it is a fixed speed protocol with no auto-negotiation whatsoever. From my research of the SGMII-2500 patent [1], it has nothing to do with SGMII-2500. That one: * does not define any change to the AN base page compared to plain 10/100/1000 SGMII. This implies that the 2500 speed is not negotiable, but the other speeds are. In our case, when the SerDes is configured for this protocol it's configured for good, there's no going back to SGMII. * runs at a higher base frequency than regular SGMII. So SGMII-2500 operating at 1000 Mbps wouldn't interoperate with plain SGMII at 1000 Mbps. Strange, but ok.. * Emulates lower link speeds than 2500 by duplicating the codewords twice, then thrice, then twice again etc (2.5/25/250 times on average). The Layerscape PCS doesn't do that (it is fixed at 2500 Mbaud). But on the other hand it isn't Base-X either, since it doesn't do 802.3z / clause 37 auto negotiation (flow control, local/remote fault etc). So it is a protocol in its own right (a rather fixed one). If reviewers think it should have its own phy-mode, different than 2500base-x, I'm in favor of that. It's just that I couldn't find a suitable name for it. quarter-xaui? 3125mhz-8b10b? When inter-operating with the Aquantia AQR112 and AQR412 PHYs in this mode (for which the PHY uses a proprietary name "OCSGMII"), we do still need to support the lower link speeds negotiated by the PHY on copper side. So what we typically do is we enable rate adaptation in the PHY firmware, with pause frames and a fixed link speed on the system side. Raising this as a discussion item to understand how we can model this quirky operating mode in Linux without imposing limitations on others (we have some downstream patches on the Aquantia PHY driver as well). Another item to discuss is whether we should be able to disable AN in the PCS independently of whether we have a PHY as a peer or not. With an SGMII PHY-less connection, there may be no auto-negotiation master so the link partners will bounce for a while and then they'll settle on 10 Mbps as link speed. For those connections (such as an SGMII link to a downstream switch) we need to disable AN in the PCS and force the speed to 1000. So: * If we have a PHY we want to do auto-neg * If we don't have a PHY, maybe we want AN, maybe we don't * In the 2500Base-X mode, we can't do AN because the hardware doesn't support it I have found the 'managed = "in-band-status"' DTS binding to somewhat address this concern, but I am a bit reluctant to disable SGMII AN if it isn't set. We have boards with device trees that need to be migrated from PHYLIB and I am concerned that changing the behavior w.r.t. in-band AN (when the "in-band-status" property is not present) is not going to be pleasant. I do understand that the "in-band-status" property is primarily intended to be used with PHY-less setups, and it looks like the fact it does work with PHYs as well is more of an oversight (as patch 2/8 shows). So I'm not sure who else really uses it with a phy-handle. There are some more open questions in patch 8/8. I dropped the Ocelot PHYLINK conversion because: * I don't have VSC7514 hardware anyway * The hardware is so different in this regard that there's almost nothing to share anyway. [0]: https://www.spinics.net/lists/netdev/msg613869.html [1]: https://patents.google.com/patent/US7356047B1/en Claudiu Manoil (1): enetc: Make MDIO accessors more generic and export to include/linux/fsl Vladimir Oltean (7): mii: Add helpers for parsing SGMII auto-negotiation net: phylink: make QSGMII a valid PHY mode for in-band AN net: phylink: call mac_an_restart for SGMII/QSGMII inband interfaces too enetc: Set MDIO_CFG_HOLD to the recommended value of 2 net: mscc: ocelot: make phy_mode a member of the common struct ocelot_port net: mscc: ocelot: export ANA, DEV and QSYS registers to include/soc/mscc net: dsa: felix: Add PCS operations for PHYLINK drivers/net/dsa/ocelot/Kconfig | 1 + drivers/net/dsa/ocelot/felix.c | 260 ++++++++- drivers/net/dsa/ocelot/felix.h | 16 +- drivers/net/dsa/ocelot/felix_vsc9959.c | 475 +++++++++++++++- drivers/net/ethernet/freescale/enetc/enetc_hw.h | 1 + drivers/net/ethernet/freescale/enetc/enetc_mdio.c | 88 ++- drivers/net/ethernet/freescale/enetc/enetc_mdio.h | 12 - .../net/ethernet/freescale/enetc/enetc_pci_mdio.c | 43 +- drivers/net/ethernet/mscc/ocelot.c | 7 +- drivers/net/ethernet/mscc/ocelot.h | 7 +- drivers/net/ethernet/mscc/ocelot_ana.h | 625 --------------------- drivers/net/ethernet/mscc/ocelot_board.c | 4 +- drivers/net/ethernet/mscc/ocelot_dev.h | 275 --------- drivers/net/ethernet/mscc/ocelot_qsys.h | 270 --------- drivers/net/phy/phylink.c | 5 +- include/linux/fsl/enetc_mdio.h | 34 ++ include/linux/mii.h | 50 ++ include/soc/mscc/ocelot.h | 2 + include/soc/mscc/ocelot_ana.h | 625 +++++++++++++++++++++ include/soc/mscc/ocelot_dev.h | 275 +++++++++ include/soc/mscc/ocelot_qsys.h | 270 +++++++++ include/uapi/linux/mii.h | 10 + 22 files changed, 2100 insertions(+), 1255 deletions(-) delete mode 100644 drivers/net/ethernet/freescale/enetc/enetc_mdio.h delete mode 100644 drivers/net/ethernet/mscc/ocelot_ana.h delete mode 100644 drivers/net/ethernet/mscc/ocelot_dev.h delete mode 100644 drivers/net/ethernet/mscc/ocelot_qsys.h create mode 100644 include/linux/fsl/enetc_mdio.h create mode 100644 include/soc/mscc/ocelot_ana.h create mode 100644 include/soc/mscc/ocelot_dev.h create mode 100644 include/soc/mscc/ocelot_qsys.h