mbox series

[RFC,v2,net-next,00/15] Add C72/C73 copper backplane support for LX2160

Message ID 20230923134904.3627402-1-vladimir.oltean@nxp.com
Headers show
Series Add C72/C73 copper backplane support for LX2160 | expand

Message

Vladimir Oltean Sept. 23, 2023, 1:48 p.m. UTC
THIS PATCH SET DOES NOT WORK, AND IT ISN'T INTENDED TO WORK.

Changes in v2:
- replace phy_check_cdr_lock() with phy_get_status(PHY_STATUS_CDR_LOCK)
- rename PHY_MODE_ETHERNET_PHY to PHY_MODE_ETHTOOL
- stop describing the AN/LT block in the device tree and make use of the
  fact that it is discoverable. Add phy_get_status(PHY_STATUS_PCVT_ADDR)
  to the generic PHY layer to discover it.
- add the 25GBase-KR-S and 25GBase-CR-S link modes. Proper treatment of
  RS-FEC and BASE-R FEC is still TODO (will also require new API in the
  generic PHY layer).
- rework the implementation from a phylib phy_device to a phylink_pcs
  component (library code). The phy-mode is still "internal". It may or
  may have not been the right thing to do. There are some things to say
  about that on patch 08/15.
- support multi-lane link modes - tested with 40GBase-KR4
- solve the pre-configuration and register access problem for 1000Base-KX
  by having mtip_get_mdiodev() pre-configure the SerDes lane and
  protocol converter for the highest supported backplane link mode.

The original cover letter for the RFC v1 at
https://patchwork.kernel.org/project/netdevbpf/cover/20230817150644.3605105-1-vladimir.oltean@nxp.com/
is still attached below.

=================================================

I've been tasked with maintaining the copper backplane support on
Layerscape SoCs. There was a previous attempt to upstream that, which is
located here:
https://lore.kernel.org/lkml/1587732391-3374-1-git-send-email-florinel.iordache@nxp.com/

Nonetheless, I am presenting here a complete rewrite based on my
understanding. The quality of implementation is not great, and there are
probably bugs which I've yet to identify. The reason for this RFC is to
collect feedback for the overall design from networking PHY and generic
PHY maintainers.

The newly proposed generic PHY API has the mtip_backplane.c driver as a
user (a consumer), but its actual hardware implementation (which should
go in drivers/phy/freescale/phy-fsl-lynx-28g.c), is not included in this
patch set. That would just have just uselessly distracted the reviewers'
attention. This is why the patch set, as posted, does not work.

I recommend reviewing from the bottom up - dt-bindings first, those give
an overall picture of the AN/LT block and its integration in the SoC.

Future work consists of:

- supporting multi-lane link modes

- advertising more than a single link mode, and performing dynamic
  SerDes protocol switching towards that link mode. Strictly speaking,
  the hardware was intended to be used with a single link mode advertised
  over C73 (the link mode pre-configured through the RCW - Reset
  Configuration Word), and that is quite apparent in its design. With
  some inventive workarounds which I've yet to try out, it might be
  possible to circumvent the design limitations and advertise any link
  mode that the SerDes PLLs can sustain. This is in an exploratory stage
  and in the end it might not come to fruition, but I've identified some
  aspects which require forethought in the driver's design.

Both these features hit a wall given the current driver design, which is
"how do we access the AN/LT block's registers?".

The hardware designers were inconsistent in the way that they've
integrated this AN/LT blocks for 1G/10G ports, vs how they did it for
25G/40G/50G/100G ports. There is always one single AN/LT block per
SerDes lane, but for 1G/10G modes, hardware design wanted to make it
appear as if the AN/LT is part of the PCS. So it inherently responds to
the same MDIO address as the PCS. Whereas in the >25G modes, it responds
to an MDIO address defined by a different control register, and that
address is also different from the PCS' MDIO address.

In the current dt-bindings, I am requesting DT writers to put a
phy-handle towards the MDIO address of the AN/LT block (which depends on
a lot of things, see the dt-bindings document for details).

As opposed to regular ports where the PCS and SerDes PHY are controlled
by the MAC driver (drivers/net/ethernet/freescale/dpaa2/dpaa2-mac.c),
with backplane, those same components are controlled by the
mtip_backplane.c driver. The backplane AN/LT driver has a pcs-handle
towards the PCS, and it treats it as a raw MDIO device when polling for
link status.

This gets obviously problematic when switching SerDes protocols, because
the location of the backplane AN/LT block changes on the MDIO bus with
the new SerDes protocol, and what's in the device tree becomes invalid
(the phydev->mdio.addr would need to be updated). It's the same AN/LT
block, and after a SerDes protocol change it has been reset, but it moved.

The SerDes control registers for the MDIO addresses of the PCS and AN/LT blocks:
- SGMIInCR1[MDEV_PORT]
- SXGMIInCR1[MDEV_PORT]
- ANLTnCR1[MDEV_PORT]
- E25GnCR1[MDEV_PORT]
- E40GnCR1[MDEV_PORT]
- E50GnCR1[MDEV_PORT]
- E100GnCR1[MDEV_PORT]

are in premise all configurable, but it's an open question to me as to
how Linux should configure them. Currently, we treat all those addresses
as read-only and populate the device trees based on their default values.

There's an additional (related) problem for 1000base-KX. The lane starts
up at power-on reset in 1000base-X mode (non-backplane) and this means
that the PCS which responds to MDIO commands is the one used for
SGMII/1000Base-X by drivers/net/pcs/pcs-lynx.c. That responds to C22
MDIO commands. The PCS for 1000Base-KX is different, and it responds to
C45 MDIO commands. It also incorporates the AN/LT block at this C45 MDIO
address. In the current mtip_backplane.c driver design, the lane switches
to backplane mode (thus enabling the respective PCS) once the link mode
was resolved to 1000base-KX through C73 autoneg, using phy_set_mode_ext().
But that is too late with 1000base-KX, since the AN/LT block won't
respond to C45 transactions either, if the port isn't pre-configured for
1000base-KX. So C73 autoneg won't work to begin with... Somebody needs
to pre-configure the SerDes lane for backplane mode, so that the
mtip_backplane.c driver can probe, but it's not clear who and based on
what.

Finally, I reckon that in the end, the PCS should be driven using a
struct phylink_pcs, by the lynx_pcs driver, and not as an mdio_device
using raw accesses by the mtip_backplane.c. In premise, this AN/LT IP
core can be integrated, to my knowledge, with any PCS and any SerDes,
so that should be also possible in the driver design. Yet, there's a
problem: in the 1G/10G modes, there would be 2 drivers for the same
mdio_device: one for the PCS and the other for the AN/LT block (PHY).
True, they are at different MMDs, but bindings multiple Linux drivers to
mdio_devices per MMD is not a thing, I guess?

Interrupt support is also missing, and that's very far away on my TODO
list. AFAIU, PCS/ANLT interrupts are routed by the SoC to the attached
MAC's IEVENT register, which isn't enabled as an interrupt source on any
Layerscape platform yet. I've structured the code using an irqpoll
thread for now, so it is more-or-less just as event-driven as an IRQ
based driver would be.

As a side note, I have analyzed the feedback previously given to Florinel,
especially the one from Russell:
https://lore.kernel.org/lkml/20200425105210.GZ25745@shell.armlinux.org.uk/

"This uses phylib, which is a problem ..."

and I still haven't changed that here. Without going into a wall of text
explanation, I'm not fully convinced that phylib isn't, in fact, the
best place for a backplane AN/LT driver to live. At least in the initial
implementation shown here, I'm not sure that going straight for a
phylink implementation of the AN/LT block would have solved any of the
problems that I described above.

Vladimir Oltean (15):
  phy: introduce phy_get_status() and use it to report CDR lock
  phy: introduce the PHY_MODE_ETHTOOL mode for phy_set_mode_ext()
  phy: ethernet: add configuration interface for copper backplane
    Ethernet PHYs
  phy: allow querying the address of protocol converters through
    phy_get_status()
  net: add 25GBase-KR-S and 25GBase-CR-S to ethtool link mode UAPI
  net: mii: add C73 base page helpers
  net: phylink: centralize phy_interface_mode_is_8023z() &&
    phylink_autoneg_inband() checks
  net: phylink: allow PCS to handle C73 autoneg for phy-mode =
    "internal"
  net: ethtool: introduce ethtool_link_mode_str()
  net: phylink: support all ethtool port modes with inband modes
  net: phylink: support the 25GBase-KR-S and 25GBase-CR-S link modes too
  net: phylink: add the 25G link modes to
    phylink_c73_priority_resolution[]
  dt-bindings: lynx-pcs: add properties for backplane mode
  net: pcs: mtip_backplane: add driver for MoreThanIP backplane AN/LT
    core
  net: pcs: lynx: use MTIP AN/LT block for copper backplanes

 .../bindings/net/pcs/fsl,lynx-pcs.yaml        |   15 +-
 drivers/net/mii.c                             |   34 +-
 drivers/net/pcs/Kconfig                       |    8 +
 drivers/net/pcs/Makefile                      |    1 +
 drivers/net/pcs/mtip_backplane.c              | 2022 +++++++++++++++++
 drivers/net/pcs/mtip_backplane.h              |   87 +
 drivers/net/pcs/pcs-lynx.c                    |  135 ++
 drivers/net/phy/phy-core.c                    |    2 +-
 drivers/net/phy/phylink.c                     |   53 +-
 drivers/phy/phy-core.c                        |   31 +
 include/linux/ethtool.h                       |    6 +
 include/linux/mii.h                           |  105 +
 include/linux/phy/phy-ethernet.h              |  292 +++
 include/linux/phy/phy.h                       |   83 +
 include/linux/phylink.h                       |    1 +
 include/uapi/linux/ethtool.h                  |    2 +
 net/ethtool/common.c                          |   12 +
 17 files changed, 2876 insertions(+), 13 deletions(-)
 create mode 100644 drivers/net/pcs/mtip_backplane.c
 create mode 100644 drivers/net/pcs/mtip_backplane.h
 create mode 100644 include/linux/phy/phy-ethernet.h

Comments

Simon Horman Sept. 28, 2023, 1:36 p.m. UTC | #1
On Sat, Sep 23, 2023 at 04:48:52PM +0300, Vladimir Oltean wrote:
> In Layerscape and QorIQ SoCs, compliance with the backplane Ethernet
> protocol is bolted on top of the SerDes lanes using an external IP core,
> that is modeled as an Ethernet PHY. This means that dynamic tuning of
> the electrical equalization parameters of the link needs to be
> communicated with the consumer of the generic PHY.
> 
> Create a small layer of glue API between a networking PHY (dealing with
> the AN/LT logic for backplanes) and a generic PHY by extending the
> phy_configure() API with a new struct phy_configure_opts_ethernet.
> 
> There are 2 directions of interest. In the "local TX training", the
> generic PHY consumer gets requests over the wire from the link partner
> regarding changes we should make to our TX equalization. In the "remote
> TX training" direction, the generic PHY is the producer of requests,
> based on its RX status, and the generic PHY consumer polls for these
> requests until we are happy. Each request is also sent (externally to
> the generic PHY layer) to the link partner board, for it to adjust its
> TX equalization.
> 
> struct phy_configure_opts_ethernet is valid when phy_set_mode_ext() has
> been called with PHY_MODE_ETHERNET or PHY_MODE_ETHTOOL, same as with
> other union phy_configure_opts types.
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

...

> +/**
> + * struct phy_configure_opts_ethernet - Ethernet PHY configuration set

nit: please include documentation of the structure members - type,
local_tx, and remote_tx - here.

> + *
> + * This structure is used to represent the configuration state of an Ethernet
> + * PHY (of various media types).
> + */
> +struct phy_configure_opts_ethernet {
> +	enum ethernet_phy_configure_type type;
> +	union {
> +		struct c72_phy_configure_local_tx local_tx;
> +		struct c72_phy_configure_remote_tx remote_tx;
> +	};
> +};

...
Simon Horman Sept. 28, 2023, 1:38 p.m. UTC | #2
On Sat, Sep 23, 2023 at 04:48:56PM +0300, Vladimir Oltean wrote:
> In a future change, we will extend the PHY interface modes for which
> phylink allows the PCS to handle autoneg. Group the existing occurences

nit: occurrences

...
Simon Horman Sept. 28, 2023, 7:05 p.m. UTC | #3
On Sat, Sep 23, 2023 at 04:48:52PM +0300, Vladimir Oltean wrote:

...

> +/**
> + * coef_update_opposite - return the opposite of one C72 coefficient update
> + *			  request
> + *
> + * @update:	original coefficient update
> + *
> + * Helper to transform the update request of one equalization tap into a
> + * request of the same tap in the opposite direction. May be used by C72
> + * phy remote TX link training algorithms.
> + */
> +static inline enum coef_update coef_update_opposite(enum coef_update update)

Hi Vladimir,

another nit from me.

Please put the inline keyword first.
Likewise elsewhere in this patch.

Tooling, including gcc-13 with W=1, complains about this.

> +{
> +	switch (update) {
> +	case COEF_UPD_INC:
> +		return COEF_UPD_DEC;
> +	case COEF_UPD_DEC:
> +		return COEF_UPD_INC;
> +	default:
> +		return COEF_UPD_HOLD;
> +	}
> +}

...
Simon Horman Sept. 28, 2023, 7:06 p.m. UTC | #4
On Sat, Sep 23, 2023 at 04:49:03PM +0300, Vladimir Oltean wrote:

...

> +static int mtip_rx_c72_coef_update(struct mtip_backplane *priv,
> +				   struct c72_coef_update *upd,
> +				   bool *rx_ready)
> +{
> +	char upd_buf[C72_COEF_UPDATE_BUFSIZ], stat_buf[C72_COEF_STATUS_BUFSIZ];
> +	struct device *dev = &priv->mdiodev->dev;
> +	struct c72_coef_status stat = {};
> +	int err, val;
> +
> +	err = read_poll_timeout(mtip_read_lt_lp_coef_if_not_ready,
> +				val, val < 0 || *rx_ready || LT_COEF_UPD_ANYTHING(val),
> +				MTIP_COEF_STAT_SLEEP_US, MTIP_COEF_STAT_TIMEOUT_US,
> +				false, priv, rx_ready);
> +	if (val < 0)
> +		return val;
> +	if (*rx_ready) {
> +		if (!priv->any_request_received)
> +			dev_warn(dev,
> +				 "LP says its RX is ready, but there was no coefficient request (LP_STAT = 0x%x, LD_STAT = 0x%x)\n",
> +				 mtip_read_lt(priv, LT_LP_STAT),
> +				 mtip_read_lt(priv, LT_LD_STAT));
> +		else
> +			dev_dbg(dev, "LP says its RX is ready\n");
> +		return 0;
> +	}
> +	if (err) {
> +		dev_err(dev,
> +			"LP did not request coefficient updates; LP_COEF = 0x%x\n",
> +			val);
> +		return err;
> +	}
> +
> +	upd->com1 = LT_COM1_X(val);
> +	upd->coz = LT_COZ_X(val);
> +	upd->cop1 = LT_COP1_X(val);
> +	upd->init = !!(val & LT_COEF_UPD_INIT);
> +	upd->preset = !!(val & LT_COEF_UPD_PRESET);
> +	

Hi Vladimir,

I'm unsure if this can actually happen.
But if the while loop runs zero times then err is used uninitialised here.

As flagged by Smatch.

> +		mtip_an_restart_from_lt(priv);
> +
> +	kfree(lt_work);
> +}
> +
> +/* Train the link partner TX, so that the local RX quality improves */
> +static void mtip_remote_tx_lt_work(struct kthread_work *work)
> +{
> +	struct mtip_lt_work *lt_work = container_of(work, struct mtip_lt_work,
> +						    work);
> +	struct mtip_backplane *priv = lt_work->priv;
> +	struct device *dev = &priv->mdiodev->dev;
> +	int err;
> +
> +	while (true) {
> +		struct c72_coef_status status = {};
> +		union phy_configure_opts opts = {
> +			.ethernet = {
> +				.type = C72_REMOTE_TX,
> +			},
> +		};
> +
> +		if (READ_ONCE(priv->lt_stop_request))
> +			goto out;

Likewise, I'm unsure if this can happen.
But if the condition above is met on the first iteration of
the loop then the out label will use err without it being initialised.

Also flagged by Smatch.

> +
> +		err = mtip_lt_in_progress(priv);
> +		if (err) {
> +			dev_err(dev, "Remote TX LT failed: %pe\n", ERR_PTR(err));
> +			goto out;
> +		}
> +
> +		err = phy_configure(priv->serdes, &opts);
> +		if (err) {
> +			dev_err(dev,
> +				"Failed to get remote TX training request from SerDes: %pe\n",
> +				ERR_PTR(err));
> +			goto out;
> +		}
> +
> +		if (opts.ethernet.remote_tx.rx_ready)
> +			break;
> +
> +		err = mtip_tx_c72_coef_update(priv, &opts.ethernet.remote_tx.update,
> +					      &status);
> +		if (opts.ethernet.remote_tx.cb)
> +			opts.ethernet.remote_tx.cb(opts.ethernet.remote_tx.cb_priv,
> +						   err, opts.ethernet.remote_tx.update,
> +						   status);
> +		if (err)
> +			goto out;
> +	}
> +
> +	/* Let the link partner know we're done */
> +	err = mtip_modify_lt(priv, LT_LD_STAT, LT_COEF_STAT_RX_READY,
> +			     LT_COEF_STAT_RX_READY);
> +	if (err < 0) {
> +		dev_err(dev, "Failed to update LT_LD_STAT: %pe\n",
> +			ERR_PTR(err));
> +		goto out;
> +	}
> +
> +	err = mtip_remote_tx_lt_done(priv);
> +	if (err) {
> +		dev_err(dev, "Failed to finalize remote LT: %pe\n",
> +			ERR_PTR(err));
> +		goto out;
> +	}
> +
> +out:
> +	if (err && !READ_ONCE(priv->lt_stop_request))
> +		mtip_an_restart_from_lt(priv);
> +
> +	kfree(lt_work);
> +}

...
Vinod Koul Sept. 29, 2023, 4:23 p.m. UTC | #5
On 23-09-23, 16:48, Vladimir Oltean wrote:
> The bit stream handled by a SerDes lane needs protocol converters to be
> usable for Ethernet. On Freescale/NXP SoCs, those protocol converters
> are located on the internal MDIO buses of the Ethernet MACs that need
> them.
> 
> The location on that MDIO bus, on these SoCs, is not fixed, but given by
> some control registers of the SerDes block itself.
> 
> Because no one modifies those addresses from the power-on default, so
> far we've relied on hardcoding the default values in the device trees,
> resulting in something like this:
> 
> 		pcs_mdio1: mdio@8c07000 {
> 			compatible = "fsl,fman-memac-mdio";
> 
> 			pcs1: ethernet-phy@0 {
> 				reg = <0>;
> 			};
> 		};
> 
> where the "reg" of "pcs1" can actually be retrieved from "serdes_1".
> 
> That was for the PCS. For AN/LT blocks, that can also be done, but the
> MAC to PCS to AN/LT block mapping is non-trivial and extremely easy to
> get wrong, which will confuse and frustrate any device tree writers.
> 
> The proposal is to take advantage of the fact that these protocol
> converters *are* discoverable, and to side-step that entire device tree
> mapping issue by not putting them in the device tree at all. So, one of
> the consumers of the SerDes PHY uses the phy_get_status() API to figure
> out the address on the MDIO bus, it also has a reference to the MDIO bus
> => it can create the mdio_device in a non OF-based manner.
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
> v1->v2: patch is new
> 
>  include/linux/phy/phy.h | 29 +++++++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)
> 
> diff --git a/include/linux/phy/phy.h b/include/linux/phy/phy.h
> index f1f03fa66943..ee721067517b 100644
> --- a/include/linux/phy/phy.h
> +++ b/include/linux/phy/phy.h
> @@ -56,6 +56,33 @@ enum phy_media {
>  enum phy_status_type {
>  	/* Valid for PHY_MODE_ETHERNET and PHY_MODE_ETHTOOL */
>  	PHY_STATUS_CDR_LOCK,
> +	PHY_STATUS_PCVT_ADDR,
> +};
> +
> +/* enum phy_pcvt_type - PHY protocol converter type

It is not a generic protocol converter but an ethernet phy protocol
converter, so i guess we should add that here (we are generic phy and
not ethernet phy here!

> + *
> + * @PHY_PCVT_ETHERNET_PCS: Ethernet Physical Coding Sublayer, top-most layer of
> + *			   an Ethernet PHY. Connects through MII to the MAC,
> + *			   and handles link status detection and the conversion
> + *			   of MII signals to link-specific code words (8b/10b,
> + *			   64b/66b etc).
> + * @PHY_PCVT_ETHERNET_ANLT: Ethernet Auto-Negotiation and Link Training,
> + *			    bottom-most layer of an Ethernet PHY, beneath the
> + *			    PMA and PMD. Its activity is only visible on the
> + *			    physical medium, and it is responsible for
> + *			    selecting the most adequate PCS/PMA/PMD set that
> + *			    can operate on that medium.
> + */
> +enum phy_pcvt_type {
> +	PHY_PCVT_ETHERNET_PCS,
> +	PHY_PCVT_ETHERNET_ANLT,
> +};
> +
> +struct phy_status_opts_pcvt {
> +	enum phy_pcvt_type type;
> +	union {
> +		unsigned int mdio;
> +	} addr;
>  };
>  
>  /* If the CDR (Clock and Data Recovery) block is able to lock onto the RX bit
> @@ -71,9 +98,11 @@ struct phy_status_opts_cdr {
>   * union phy_status_opts - Opaque generic phy status
>   *
>   * @cdr:	Configuration set applicable for PHY_STATUS_CDR_LOCK.
> + * @pcvt:	Configuration set applicable for PHY_STATUS_PCVT_ADDR.
>   */
>  union phy_status_opts {
>  	struct phy_status_opts_cdr		cdr;
> +	struct phy_status_opts_pcvt		pcvt;
>  };
>  
>  /**
> -- 
> 2.34.1
Vladimir Oltean Oct. 2, 2023, 10:09 a.m. UTC | #6
Hi Vinod,

On Fri, Sep 29, 2023 at 09:53:22PM +0530, Vinod Koul wrote:
> On 23-09-23, 16:48, Vladimir Oltean wrote:
> > The bit stream handled by a SerDes lane needs protocol converters to be
> > usable for Ethernet. On Freescale/NXP SoCs, those protocol converters
> > are located on the internal MDIO buses of the Ethernet MACs that need
> > them.
> > 
> > The location on that MDIO bus, on these SoCs, is not fixed, but given by
> > some control registers of the SerDes block itself.
> > 
> > Because no one modifies those addresses from the power-on default, so
> > far we've relied on hardcoding the default values in the device trees,
> > resulting in something like this:
> > 
> > 		pcs_mdio1: mdio@8c07000 {
> > 			compatible = "fsl,fman-memac-mdio";
> > 
> > 			pcs1: ethernet-phy@0 {
> > 				reg = <0>;
> > 			};
> > 		};
> > 
> > where the "reg" of "pcs1" can actually be retrieved from "serdes_1".
> > 
> > That was for the PCS. For AN/LT blocks, that can also be done, but the
> > MAC to PCS to AN/LT block mapping is non-trivial and extremely easy to
> > get wrong, which will confuse and frustrate any device tree writers.
> > 
> > The proposal is to take advantage of the fact that these protocol
> > converters *are* discoverable, and to side-step that entire device tree
> > mapping issue by not putting them in the device tree at all. So, one of
> > the consumers of the SerDes PHY uses the phy_get_status() API to figure
> > out the address on the MDIO bus, it also has a reference to the MDIO bus
> > => it can create the mdio_device in a non OF-based manner.
> > 
> > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> > ---
> > v1->v2: patch is new
> > 
> >  include/linux/phy/phy.h | 29 +++++++++++++++++++++++++++++
> >  1 file changed, 29 insertions(+)
> > 
> > diff --git a/include/linux/phy/phy.h b/include/linux/phy/phy.h
> > index f1f03fa66943..ee721067517b 100644
> > --- a/include/linux/phy/phy.h
> > +++ b/include/linux/phy/phy.h
> > @@ -56,6 +56,33 @@ enum phy_media {
> >  enum phy_status_type {
> >  	/* Valid for PHY_MODE_ETHERNET and PHY_MODE_ETHTOOL */
> >  	PHY_STATUS_CDR_LOCK,
> > +	PHY_STATUS_PCVT_ADDR,
> > +};
> > +
> > +/* enum phy_pcvt_type - PHY protocol converter type
> 
> It is not a generic protocol converter but an ethernet phy protocol
> converter, so i guess we should add that here (we are generic phy and
> not ethernet phy here!

Can you please show, using a diff, what modification you would like to see?
In my mind, enum phy_pcvt_type could also contain non-Ethernet protocol
converter types, and so, I didn't want to add "ethernet_" in it.
The "ETHERNET_" prefix will be part of the individual enum values that
are applicable to Ethernet.

> > + *
> > + * @PHY_PCVT_ETHERNET_PCS: Ethernet Physical Coding Sublayer, top-most layer of
> > + *			   an Ethernet PHY. Connects through MII to the MAC,
> > + *			   and handles link status detection and the conversion
> > + *			   of MII signals to link-specific code words (8b/10b,
> > + *			   64b/66b etc).
> > + * @PHY_PCVT_ETHERNET_ANLT: Ethernet Auto-Negotiation and Link Training,
> > + *			    bottom-most layer of an Ethernet PHY, beneath the
> > + *			    PMA and PMD. Its activity is only visible on the
> > + *			    physical medium, and it is responsible for
> > + *			    selecting the most adequate PCS/PMA/PMD set that
> > + *			    can operate on that medium.
> > + */
> > +enum phy_pcvt_type {
> > +	PHY_PCVT_ETHERNET_PCS,
> > +	PHY_PCVT_ETHERNET_ANLT,
> > +};
> > +
> > +struct phy_status_opts_pcvt {
> > +	enum phy_pcvt_type type;
> > +	union {
> > +		unsigned int mdio;
> > +	} addr;
> >  };
> >  
> >  /* If the CDR (Clock and Data Recovery) block is able to lock onto the RX bit
> > @@ -71,9 +98,11 @@ struct phy_status_opts_cdr {
> >   * union phy_status_opts - Opaque generic phy status
> >   *
> >   * @cdr:	Configuration set applicable for PHY_STATUS_CDR_LOCK.
> > + * @pcvt:	Configuration set applicable for PHY_STATUS_PCVT_ADDR.
> >   */
> >  union phy_status_opts {
> >  	struct phy_status_opts_cdr		cdr;
> > +	struct phy_status_opts_pcvt		pcvt;
> >  };
> >  
> >  /**
> > -- 
> > 2.34.1
> 
> -- 
> ~Vinod
Vladimir Oltean Oct. 2, 2023, 1:11 p.m. UTC | #7
Hi Simon,

On Thu, Sep 28, 2023 at 09:05:36PM +0200, Simon Horman wrote:
> On Sat, Sep 23, 2023 at 04:48:52PM +0300, Vladimir Oltean wrote:
> 
> ...
> 
> > +/**
> > + * coef_update_opposite - return the opposite of one C72 coefficient update
> > + *			  request
> > + *
> > + * @update:	original coefficient update
> > + *
> > + * Helper to transform the update request of one equalization tap into a
> > + * request of the same tap in the opposite direction. May be used by C72
> > + * phy remote TX link training algorithms.
> > + */
> > +static inline enum coef_update coef_update_opposite(enum coef_update update)
> 
> Hi Vladimir,
> 
> another nit from me.
> 
> Please put the inline keyword first.
> Likewise elsewhere in this patch.
> 
> Tooling, including gcc-13 with W=1, complains about this.

Thanks for pointing this out. I guess you are talking about the c72_coef_update_print()
function, whose prototype is mistakenly "static void inline" instead of
"static inline void". I cannot find the problem with the quoted coef_update_opposite().
Vladimir Oltean Oct. 2, 2023, 2:17 p.m. UTC | #8
Hi Russell,

On Sat, Sep 23, 2023 at 04:48:57PM +0300, Vladimir Oltean wrote:
> Some phylink and phylib based systems might want to operate on backplane
> media types ("K" in the name), and thus, picking a phy_interface_t for
> them becomes a challenge.
> 
> phy_interface_t is a description of the connection between the MAC and
> the PHY, or if a MAC-side PCS is present, the connection between that
> and the next link segment (which can be remote).
> 
> A MAC-side PCS is so far considered to be a PCS handling link modes with
> optional C37 autoneg. But C73 autoneg (for backplanes and SFP28 modules)
> is not at the same level in the OSI layering, so that existing model may
> or may not apply.
> 
> (a) If we say that the PCS is MAC-side for C73 modes as well, the
>     implication seems to be that the phy-mode should be one of
>     PHY_INTERFACE_MODE_10GBASEKR, PHY_INTERFACE_MODE_1000BASEKX, etc.
>     Similar to PHY_INTERFACE_MODE_1000BASEX which imitates the link mode
>     ETHTOOL_LINK_MODE_1000baseX_Full_BIT.
> 
> (b) If we say that the PCS is not MAC-side, but rather that the
>     phylink_pcs represents an entire non-phylib backplane PHY which may
>     negotiate one of many link modes (like a copper phylib PHY), then
>     the phy-mode should probably be one of PHY_INTERFACE_MODE_XGMII,
>     XLGMII etc. Or rather, because there is no MII pinout per se and the
>     backplane PHY / phylink_pcs is internal, we can also use
>     PHY_INTERFACE_MODE_INTERNAL.
> 
> The trouble with (a), in my opinion, is that if we let the phy_interface_t
> follow the link mode like in the case of Base-X fiber modes, we have to
> consider the fact that C73 PHYs can advertise multiple link modes, so
> the phy_interface_t selection will be arbitrary, and any phy_interface_t
> selection will have to leave in the "supported" and "advertised" masks
> of link modes all the other backplane modes. This may be hard to justify.
> 
> That is the reasoning based on which I selected this phy-mode to
> describe the setup in Layerscape SoCs which have integrated backplane
> autoneg support. The changes in phylink permit the managed =
> "in-band-status" fwnode property to be extended for C73 autoneg, which
> is then controllable through ethtool. With phy-mode = "internal" in an
> in-band autoneg mode, we advertise all backplane link modes. The list is
> not exhaustive and may be extended in the future.
> 
> Link: https://lore.kernel.org/netdev/ZOXlpkbcAZ4okric@shell.armlinux.org.uk/
> Link: https://lore.kernel.org/netdev/ZGIkGmyL8yL1q1zp@shell.armlinux.org.uk/
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
> v1->v2: patch is new
> 
>  drivers/net/phy/phylink.c | 19 ++++++++++++++++++-
>  include/linux/phylink.h   |  1 +
>  2 files changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> index 548130d77302..88ace7e203c3 100644
> --- a/drivers/net/phy/phylink.c
> +++ b/drivers/net/phy/phylink.c
> @@ -972,6 +972,21 @@ static int phylink_parse_mode(struct phylink *pl,
>  			phylink_set(pl->supported, 100000baseDR2_Full);
>  			break;
>  
> +		case PHY_INTERFACE_MODE_INTERNAL:
> +			phylink_set(pl->supported, 1000baseKX_Full);
> +			phylink_set(pl->supported, 10000baseKX4_Full);
> +			phylink_set(pl->supported, 10000baseKR_Full);
> +			phylink_set(pl->supported, 25000baseCR_Full);
> +			phylink_set(pl->supported, 25000baseKR_Full);
> +			phylink_set(pl->supported, 25000baseCR_S_Full);
> +			phylink_set(pl->supported, 25000baseKR_S_Full);
> +			phylink_set(pl->supported, 40000baseKR4_Full);
> +			phylink_set(pl->supported, 50000baseKR2_Full);
> +			phylink_set(pl->supported, 50000baseKR_Full);
> +			phylink_set(pl->supported, 100000baseKR4_Full);
> +			phylink_set(pl->supported, 100000baseKR2_Full);
> +			break;
> +
>  		default:
>  			phylink_err(pl,
>  				    "incorrect link mode %s for in-band status\n",
> @@ -1109,7 +1124,9 @@ static void phylink_mac_config(struct phylink *pl,
>  
>  static bool phylink_pcs_handles_an(phy_interface_t iface, unsigned int mode)
>  {
> -	return phy_interface_mode_is_8023z(iface) && phylink_autoneg_inband(mode);
> +	return (phy_interface_mode_is_8023z(iface) ||
> +		iface == PHY_INTERFACE_MODE_INTERNAL) &&
> +	       phylink_autoneg_inband(mode);
>  }
>  
>  static void phylink_pcs_an_restart(struct phylink *pl)
> diff --git a/include/linux/phylink.h b/include/linux/phylink.h
> index 2b886ea654bb..7e8e26001587 100644
> --- a/include/linux/phylink.h
> +++ b/include/linux/phylink.h
> @@ -141,6 +141,7 @@ static inline unsigned int phylink_pcs_neg_mode(unsigned int mode,
>  
>  	case PHY_INTERFACE_MODE_1000BASEX:
>  	case PHY_INTERFACE_MODE_2500BASEX:
> +	case PHY_INTERFACE_MODE_INTERNAL:
>  		/* 1000base-X is designed for use media-side for Fibre
>  		 * connections, and thus the Autoneg bit needs to be
>  		 * taken into account. We also do this for 2500base-X
> -- 
> 2.34.1
>

What do you think about this change?
Simon Horman Oct. 2, 2023, 5:20 p.m. UTC | #9
On Mon, Oct 02, 2023 at 04:11:10PM +0300, Vladimir Oltean wrote:
> Hi Simon,
> 
> On Thu, Sep 28, 2023 at 09:05:36PM +0200, Simon Horman wrote:
> > On Sat, Sep 23, 2023 at 04:48:52PM +0300, Vladimir Oltean wrote:
> > 
> > ...
> > 
> > > +/**
> > > + * coef_update_opposite - return the opposite of one C72 coefficient update
> > > + *			  request
> > > + *
> > > + * @update:	original coefficient update
> > > + *
> > > + * Helper to transform the update request of one equalization tap into a
> > > + * request of the same tap in the opposite direction. May be used by C72
> > > + * phy remote TX link training algorithms.
> > > + */
> > > +static inline enum coef_update coef_update_opposite(enum coef_update update)
> > 
> > Hi Vladimir,
> > 
> > another nit from me.
> > 
> > Please put the inline keyword first.
> > Likewise elsewhere in this patch.
> > 
> > Tooling, including gcc-13 with W=1, complains about this.
> 
> Thanks for pointing this out. I guess you are talking about the c72_coef_update_print()
> function, whose prototype is mistakenly "static void inline" instead of
> "static inline void". I cannot find the problem with the quoted coef_update_opposite().

Yes, you are right.
Sorry for my error.
Florian Fainelli Oct. 2, 2023, 7:16 p.m. UTC | #10
On 9/23/23 06:48, Vladimir Oltean wrote:
> Some modules, like the MTIP AN/LT block used as a copper backplane PHY
> driver, need this extra information from the SerDes PHY as another
> source of "link up" information.
> 
> Namely, the 25GBase-R PCS does not have a MDIO_CTRL1_LPOWER bit
> implemented in its MDIO_MMD_PCS:MDIO_CTRL1 register. That bit is
> typically set from phy_suspend() or phylink_pcs_disable() implementations,
> and that is supposed to cause a link drop event on the link partner.
> But here it does not happen.
> 
> By implementing the networking phylink_pcs_disable() as phy_power_off(),
> we are able to actually power down the lane in a way that is visible to
> the remote end. Where it is visible is the CDR lock, so we introduce
> PHY_STATUS_TYPE_CDR_LOCK as an extra link indication, we are able to
> detect that condition and signal it to upper layers of the network
> stack.
> 
> A more high-level and generic phy_get_status() operation was chosen
> instead of the more specific phy_get_cdr_lock() alternative, because I
> saw this as being more in the spirit of the generic PHY API.
> Also, phy_get_status() is more extensible and reusable for other
> purposes as well.
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Florian Fainelli Oct. 2, 2023, 7:19 p.m. UTC | #11
On 9/23/23 06:48, Vladimir Oltean wrote:
> In networking, we have 2 distinct data types:
> 
> - phy_interface_t describes the link between a MAC (or MAC-side PCS) and
>    an attached PHY or SFP cage
> 
> - enum ethtool_link_mode_bit_indices describes the link between a local
>    PHY and a remote PHY (for example, gigabit RJ45 twisted copper pairs)
> 
> Currently, phy_set_mode_ext(PHY_MODE_ETHERNET) takes arguments of the
> phy_interface_t type, and there is no way to pass an argument of the
> enum ethtool_link_mode_bit_indices type. The new PHY_MODE_ETHTOOL
> intends to address that.
> 
> It is true that there is currently some overlap between these data
> types, namely:
> 
>   phy_interface_t                enum ethtool_link_mode_bit_indices
>   -----------------------------------------------------------------
>   PHY_INTERFACE_MODE_10GKR       ETHTOOL_LINK_MODE_10000baseKR_Full_BIT
>   PHY_INTERFACE_MODE_1000BASEKX  ETHTOOL_LINK_MODE_1000baseKX_Full_BIT
> 
> but those overlaps were deemed to be mistakes, and PHY-to-PHY link modes
> should only be added to ethtool_link_mode_bit_indices going forward.
> Thus, I believe that the distinction is necessary, rather than hacking
> more improper PHY modes. Some of the PHY-to-PHY link modes which may be
> added in the future (to ethtool_link_mode_bit_indices and not to
> phy_interface_t) are:
> 
> 	ETHTOOL_LINK_MODE_100000baseKP4_Full_BIT
> 	ETHTOOL_LINK_MODE_100000baseCR10_Full_BIT
> 	ETHTOOL_LINK_MODE_25000baseKR_S_Full_BIT
> 	ETHTOOL_LINK_MODE_25000baseCR_S_Full_BIT
> 
> One user of PHY_MODE_ETHTOOL will be the MTIP backplane AN/LT + Lynx
> SerDes PHY combo, where the backplane autoneg protocol (IEEE 802.3
> clause 73) selects the operating PHY-to-PHY link mode.
> 
> There are electrical differences between the PHY-to-PHY backplane link
> modes (like ETHTOOL_LINK_MODE_10000baseKR_Full_BIT) and their
> non-backplane counterparts (like PHY_INTERFACE_MODE_10GBASER), namely
> the number of TX signal equalization taps and their configurability.
> This further justifies distinguishing between them in the generic PHY
> API.
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
> v1->v2: rename PHY_MODE_ETHERNET_PHY to PHY_MODE_ETHTOOL at Russell's
> suggestion
> 
>   include/linux/phy/phy.h | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/phy/phy.h b/include/linux/phy/phy.h
> index 6be348f1fa0e..72ef4afcda81 100644
> --- a/include/linux/phy/phy.h
> +++ b/include/linux/phy/phy.h
> @@ -39,6 +39,7 @@ enum phy_mode {
>   	PHY_MODE_UFS_HS_B,
>   	PHY_MODE_PCIE,
>   	PHY_MODE_ETHERNET,
> +	PHY_MODE_ETHTOOL,

Not feeling very comfortable with using ETHTOOL here because that is a 
Linux sub-subsystem name as opposed to the other enumeration values 
which are electrical modes of operation and/or industry standards names.

How about PHY_MODE_ETHERNET_EXPLICIT or PHY_MODE_ETHERNET_LINKMODE?
Russell King (Oracle) Oct. 3, 2023, 11:06 a.m. UTC | #12
Hi Vladimir,

On Mon, Oct 02, 2023 at 05:17:43PM +0300, Vladimir Oltean wrote:
> Hi Russell,
> 
> On Sat, Sep 23, 2023 at 04:48:57PM +0300, Vladimir Oltean wrote:
> > Some phylink and phylib based systems might want to operate on backplane
> > media types ("K" in the name), and thus, picking a phy_interface_t for
> > them becomes a challenge.
> > 
> > phy_interface_t is a description of the connection between the MAC and
> > the PHY, or if a MAC-side PCS is present, the connection between that
> > and the next link segment (which can be remote).
> > 
> > A MAC-side PCS is so far considered to be a PCS handling link modes with
> > optional C37 autoneg. But C73 autoneg (for backplanes and SFP28 modules)
> > is not at the same level in the OSI layering, so that existing model may
> > or may not apply.
> > 
> > (a) If we say that the PCS is MAC-side for C73 modes as well, the
> >     implication seems to be that the phy-mode should be one of
> >     PHY_INTERFACE_MODE_10GBASEKR, PHY_INTERFACE_MODE_1000BASEKX, etc.
> >     Similar to PHY_INTERFACE_MODE_1000BASEX which imitates the link mode
> >     ETHTOOL_LINK_MODE_1000baseX_Full_BIT.
> > 
> > (b) If we say that the PCS is not MAC-side, but rather that the
> >     phylink_pcs represents an entire non-phylib backplane PHY which may
> >     negotiate one of many link modes (like a copper phylib PHY), then
> >     the phy-mode should probably be one of PHY_INTERFACE_MODE_XGMII,
> >     XLGMII etc. Or rather, because there is no MII pinout per se and the
> >     backplane PHY / phylink_pcs is internal, we can also use
> >     PHY_INTERFACE_MODE_INTERNAL.
> > 
> > The trouble with (a), in my opinion, is that if we let the phy_interface_t
> > follow the link mode like in the case of Base-X fiber modes, we have to
> > consider the fact that C73 PHYs can advertise multiple link modes, so
> > the phy_interface_t selection will be arbitrary, and any phy_interface_t
> > selection will have to leave in the "supported" and "advertised" masks
> > of link modes all the other backplane modes. This may be hard to justify.
> > 
> > That is the reasoning based on which I selected this phy-mode to
> > describe the setup in Layerscape SoCs which have integrated backplane
> > autoneg support. The changes in phylink permit the managed =
> > "in-band-status" fwnode property to be extended for C73 autoneg, which
> > is then controllable through ethtool. With phy-mode = "internal" in an
> > in-band autoneg mode, we advertise all backplane link modes. The list is
> > not exhaustive and may be extended in the future.
> > 
> > Link: https://lore.kernel.org/netdev/ZOXlpkbcAZ4okric@shell.armlinux.org.uk/
> > Link: https://lore.kernel.org/netdev/ZGIkGmyL8yL1q1zp@shell.armlinux.org.uk/
> > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> > ---
> > v1->v2: patch is new
> > 
> >  drivers/net/phy/phylink.c | 19 ++++++++++++++++++-
> >  include/linux/phylink.h   |  1 +
> >  2 files changed, 19 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> > index 548130d77302..88ace7e203c3 100644
> > --- a/drivers/net/phy/phylink.c
> > +++ b/drivers/net/phy/phylink.c
> > @@ -972,6 +972,21 @@ static int phylink_parse_mode(struct phylink *pl,
> >  			phylink_set(pl->supported, 100000baseDR2_Full);
> >  			break;
> >  
> > +		case PHY_INTERFACE_MODE_INTERNAL:
> > +			phylink_set(pl->supported, 1000baseKX_Full);
> > +			phylink_set(pl->supported, 10000baseKX4_Full);
> > +			phylink_set(pl->supported, 10000baseKR_Full);
> > +			phylink_set(pl->supported, 25000baseCR_Full);
> > +			phylink_set(pl->supported, 25000baseKR_Full);
> > +			phylink_set(pl->supported, 25000baseCR_S_Full);
> > +			phylink_set(pl->supported, 25000baseKR_S_Full);
> > +			phylink_set(pl->supported, 40000baseKR4_Full);
> > +			phylink_set(pl->supported, 50000baseKR2_Full);
> > +			phylink_set(pl->supported, 50000baseKR_Full);
> > +			phylink_set(pl->supported, 100000baseKR4_Full);
> > +			phylink_set(pl->supported, 100000baseKR2_Full);
> > +			break;

I wonder whether this should just set all link modes, much like
phylink_get_capabilities() allows.

I'm also wondering whether the contents of this switch() statement
should now just do:

		case PHY_INTERFACE_... (for each supported mode):
			cals = ~(MAC_SYM_PAUSE | MAC_ASYM_PAUSE);
			caps = phylink_get_capabilities(interface, caps,
							RATE_MATCH_NONE);
			phylink_caps_to_linkmodes(pl->supported, caps);
			break;

rather than duplicating the logic.

That said, 10GBASER and 10GKR are treated slightly differently because
of the problem with PHYs like 88x3310, and I think it's now difficult
to undo that bit of history.

> > +
> >  		default:
> >  			phylink_err(pl,
> >  				    "incorrect link mode %s for in-band status\n",
> > @@ -1109,7 +1124,9 @@ static void phylink_mac_config(struct phylink *pl,
> >  
> >  static bool phylink_pcs_handles_an(phy_interface_t iface, unsigned int mode)
> >  {
> > -	return phy_interface_mode_is_8023z(iface) && phylink_autoneg_inband(mode);
> > +	return (phy_interface_mode_is_8023z(iface) ||
> > +		iface == PHY_INTERFACE_MODE_INTERNAL) &&
> > +	       phylink_autoneg_inband(mode);

Is this true also for DSA devices that use "internal" mode? I'm
wondering whether this will cause the PHY to be ignored/remain
unattached in DSA switches because of the changes in patch 7.

> >  }
> >  
> >  static void phylink_pcs_an_restart(struct phylink *pl)
> > diff --git a/include/linux/phylink.h b/include/linux/phylink.h
> > index 2b886ea654bb..7e8e26001587 100644
> > --- a/include/linux/phylink.h
> > +++ b/include/linux/phylink.h
> > @@ -141,6 +141,7 @@ static inline unsigned int phylink_pcs_neg_mode(unsigned int mode,
> >  
> >  	case PHY_INTERFACE_MODE_1000BASEX:
> >  	case PHY_INTERFACE_MODE_2500BASEX:
> > +	case PHY_INTERFACE_MODE_INTERNAL:
> >  		/* 1000base-X is designed for use media-side for Fibre
> >  		 * connections, and thus the Autoneg bit needs to be
> >  		 * taken into account. We also do this for 2500base-X

Thinking about DSA cases, I don't think this change would be an issue
because where DSA uses "internal" there isn't a PCS, so this won't
matter.

Note that as there is now no need for anything outside phylink.c to
reference this function, I have plans at some point to move it into
the .c file rather than keeping it as an inline in the header file.
It was temporarily necessary while introducing it to be in the
header.
Russell King (Oracle) Oct. 3, 2023, 11:19 a.m. UTC | #13
On Sat, Sep 23, 2023 at 04:48:54PM +0300, Vladimir Oltean wrote:
> diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
> index f7fba0dc87e5..421eb57fb6e9 100644
> --- a/include/uapi/linux/ethtool.h
> +++ b/include/uapi/linux/ethtool.h
> @@ -1787,6 +1787,8 @@ enum ethtool_link_mode_bit_indices {
>  	ETHTOOL_LINK_MODE_10baseT1S_Full_BIT		 = 99,
>  	ETHTOOL_LINK_MODE_10baseT1S_Half_BIT		 = 100,
>  	ETHTOOL_LINK_MODE_10baseT1S_P2MP_Half_BIT	 = 101,
> +	ETHTOOL_LINK_MODE_25000baseCR_S_Full_BIT	 = 102,
> +	ETHTOOL_LINK_MODE_25000baseKR_S_Full_BIT	 = 103,

Should these also be add to phylink_caps_to_linkmodes()'s MAC_25000FD
conditional block?
Russell King (Oracle) Oct. 3, 2023, 11:27 a.m. UTC | #14
On Sat, Sep 23, 2023 at 04:48:56PM +0300, Vladimir Oltean wrote:
> In a future change, we will extend the PHY interface modes for which
> phylink allows the PCS to handle autoneg. Group the existing occurences
> into a common phylink_pcs_handles_an().

I don't see anything wrong with this change, despite my comments on the
next patch. However, including INTERNAL in this may cause problems with
DSA. I think maybe these two patches need to be tested on DSA setups
that make use of INTERNAL.
Russell King (Oracle) Oct. 3, 2023, 11:30 a.m. UTC | #15
On Sat, Sep 23, 2023 at 04:48:58PM +0300, Vladimir Oltean wrote:
> Allow driver code to print stuff like the resolved link mode to the
> kernel log, by giving it access to the link_mode_names[] ethtool
> internal array which already holds this info.

Looks good to me.

> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

Thanks!
Russell King (Oracle) Oct. 3, 2023, 11:31 a.m. UTC | #16
On Sat, Sep 23, 2023 at 04:49:00PM +0300, Vladimir Oltean wrote:
> Treat the newly introduced subsets of 25GBase-KR and 25GBase-CR the same
> way as the fully-featured link modes. The difference only consists in
> RS-FEC support.

As mentioned in the patch adding these new linkmodes, I wonder whether
this should be part of that patch. Is there a reason to keep it
separate?
Russell King (Oracle) Oct. 3, 2023, 1:14 p.m. UTC | #17
On Sat, Sep 23, 2023 at 04:49:04PM +0300, Vladimir Oltean wrote:
> +static int lynx_pcs_parse_fwnode(struct lynx_pcs *lynx)
> +{
> +	struct fwnode_handle *node = lynx->mdio->dev.fwnode;
> +	enum mtip_model model = MTIP_MODEL_AUTODETECT;
> +	struct device_node *np = to_of_node(node);
> +	struct mdio_device *mdio = lynx->mdio;
> +	struct device *dev = &mdio->dev;
> +	struct phy *phy;
> +	int i, err;
> +
> +	if (!node)
> +		return 0;
> +
> +	lynx->backplane_mode = fwnode_property_present(node, "fsl,backplane-mode");
> +	if (!lynx->backplane_mode)
> +		return 0;
> +
> +	if (fwnode_device_is_compatible(node, "fsl,lx2160a-lynx-pcs"))
> +		model = MTIP_MODEL_LX2160A;
> +
> +	lynx->num_lanes = of_count_phandle_with_args(np, "phys", "#phy-cells");
> +	if (lynx->num_lanes < 0)
> +		return lynx->num_lanes;

Is it possible for ->num_lanes to be zero at this point? If that is
possible, then ->anlt[PRIMARY_LANE] will be NULL but ->backplane_mode
will be set, so won't that cause the mtip_* calls above to pass a
NULL pointer into those functions? Is that safe? Should we trap that
case here?

If that's correct, then I don't see any point in storing
->backplane_mode, since we can then use ->num_lanes > PRIMARY_LANE
or similar instead.

> +
> +	if (WARN_ON(lynx->num_lanes > MAX_NUM_LANES))
> +		return -EINVAL;

Do we need to use WARN_ON() here, or would it be better to print a short
error-level message?

> +
> +	for (i = 0; i < lynx->num_lanes; i++) {
> +		phy = devm_of_phy_get_by_index(dev, np, i);
> +		if (IS_ERR(phy))
> +			return dev_err_probe(dev, PTR_ERR(phy),
> +					     "Failed to get SerDes PHY %d\n", i);
> +
> +		lynx->anlt[i] = mtip_backplane_create(mdio, phy, model);
> +		if (IS_ERR(lynx->anlt[i])) {
> +			err = PTR_ERR(lynx->anlt[i]);
> +
> +			while (i-- > 0)
> +				mtip_backplane_destroy(lynx->anlt[i]);
> +
> +			return err;
> +		}
> +	}
> +
> +	for (i = 1; i < lynx->num_lanes; i++) {
> +		err = mtip_backplane_add_subordinate(lynx->anlt[PRIMARY_LANE],
> +						     lynx->anlt[i]);
> +		if (WARN_ON(err)) {

Again, does this need to be a backtrace-producing WARN_ON()?

> +			/* Too many SerDes lanes in the device tree? */
> +			for (i = 0; i < lynx->num_lanes; i++)
> +				mtip_backplane_destroy(lynx->anlt[i]);
> +			return err;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
>  static struct phylink_pcs *lynx_pcs_create(struct mdio_device *mdio)
>  {
>  	struct lynx_pcs *lynx;
> +	int err;
>  
>  	lynx = kzalloc(sizeof(*lynx), GFP_KERNEL);
>  	if (!lynx)
> @@ -327,6 +451,12 @@ static struct phylink_pcs *lynx_pcs_create(struct mdio_device *mdio)
>  	lynx->pcs.neg_mode = true;
>  	lynx->pcs.poll = true;
>  
> +	err = lynx_pcs_parse_fwnode(lynx);
> +	if (err) {
> +		kfree(lynx);
> +		return ERR_PTR(err);
> +	}
> +
>  	return lynx_to_phylink_pcs(lynx);
>  }
>  
> @@ -392,6 +522,11 @@ EXPORT_SYMBOL_GPL(lynx_pcs_create_fwnode);
>  void lynx_pcs_destroy(struct phylink_pcs *pcs)
>  {
>  	struct lynx_pcs *lynx = phylink_pcs_to_lynx(pcs);
> +	int i;
> +
> +	if (lynx->backplane_mode)
> +		for (i = 0; i < lynx->num_lanes; i++)
> +			mtip_backplane_destroy(lynx->anlt[i]);

Won't ->num_lanes only be non-zero when ->backplane_mode is set, so
isn't the test for ->backplane_mode redundant here?

>  
>  	mdio_device_put(lynx->mdio);
>  	kfree(lynx);
> -- 
> 2.34.1
> 
>
Russell King (Oracle) Oct. 3, 2023, 1:16 p.m. UTC | #18
On Sat, Sep 23, 2023 at 04:49:01PM +0300, Vladimir Oltean wrote:
> Allow phylink_resolve_c73() to resolve backplane (KR) or SFP28 (CR)
> link speeds of 25Gbps.
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

Shouldn't this also be part of patch 5?
Vladimir Oltean Oct. 3, 2023, 4:04 p.m. UTC | #19
Hi Florian,

On Mon, Oct 02, 2023 at 12:19:57PM -0700, Florian Fainelli wrote:
> > diff --git a/include/linux/phy/phy.h b/include/linux/phy/phy.h
> > index 6be348f1fa0e..72ef4afcda81 100644
> > --- a/include/linux/phy/phy.h
> > +++ b/include/linux/phy/phy.h
> > @@ -39,6 +39,7 @@ enum phy_mode {
> >   	PHY_MODE_UFS_HS_B,
> >   	PHY_MODE_PCIE,
> >   	PHY_MODE_ETHERNET,
> > +	PHY_MODE_ETHTOOL,
> 
> Not feeling very comfortable with using ETHTOOL here because that is a Linux
> sub-subsystem name as opposed to the other enumeration values which are
> electrical modes of operation and/or industry standards names.
> 
> How about PHY_MODE_ETHERNET_EXPLICIT or PHY_MODE_ETHERNET_LINKMODE?
> -- 
> Florian
>

I agree with your sentiment. I can rename this to PHY_MODE_ETHERNET_LINKMODE.
Vladimir Oltean Oct. 3, 2023, 4:24 p.m. UTC | #20
Hi Russell,

On Tue, Oct 03, 2023 at 12:31:46PM +0100, Russell King (Oracle) wrote:
> On Sat, Sep 23, 2023 at 04:49:00PM +0300, Vladimir Oltean wrote:
> > Treat the newly introduced subsets of 25GBase-KR and 25GBase-CR the same
> > way as the fully-featured link modes. The difference only consists in
> > RS-FEC support.
> 
> As mentioned in the patch adding these new linkmodes, I wonder whether
> this should be part of that patch. Is there a reason to keep it
> separate?

I can squash this into the other CR-S/KR-S patch.
Vladimir Oltean Oct. 3, 2023, 4:33 p.m. UTC | #21
On Tue, Oct 03, 2023 at 02:16:25PM +0100, Russell King (Oracle) wrote:
> On Sat, Sep 23, 2023 at 04:49:01PM +0300, Vladimir Oltean wrote:
> > Allow phylink_resolve_c73() to resolve backplane (KR) or SFP28 (CR)
> > link speeds of 25Gbps.
> > 
> > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> Shouldn't this also be part of patch 5?

Not really, no.

Apart from adding the 25000baseKR_S_Full and 25000baseCR_S_Full link
modes (which are indeed newly added in patch 5) to phylink_c73_priority_resolution[],
it also adds the pre-existing 25000baseKR_Full and 25000baseCR_Full link
modes. Without this, phylink fails to resolve the 25G backplane or SFP28
speeds, and it just reports "Link is up - unknown/unknown".

The patch splitting may have been confusing. I had 2 options, either:

(a) - create one patch which adds the missing pre-existing 25G backplane/
      SFP28 modes to phylink_c73_priority_resolution[]
    - add the CR-S/KR-S link modes to phylink_c73_priority_resolution[]
      as part of the general CR-S/KR-S addition

or

(b) - first add the CR-S/KR-S link modes everywhere where phylink
      already uses 25GBase-KR/25GBase-CR
    - extend the phylink_c73_priority_resolution[] for all 4 link modes
      at the same time

I opted for (b) but I can also go with (a) if you prefer it that way.
Vladimir Oltean Oct. 3, 2023, 7 p.m. UTC | #22
Hi Russell,

On Tue, Oct 03, 2023 at 02:14:41PM +0100, Russell King (Oracle) wrote:
> On Sat, Sep 23, 2023 at 04:49:04PM +0300, Vladimir Oltean wrote:
> > +static int lynx_pcs_parse_fwnode(struct lynx_pcs *lynx)
> > +{
> > +	struct fwnode_handle *node = lynx->mdio->dev.fwnode;
> > +	enum mtip_model model = MTIP_MODEL_AUTODETECT;
> > +	struct device_node *np = to_of_node(node);
> > +	struct mdio_device *mdio = lynx->mdio;
> > +	struct device *dev = &mdio->dev;
> > +	struct phy *phy;
> > +	int i, err;
> > +
> > +	if (!node)
> > +		return 0;
> > +
> > +	lynx->backplane_mode = fwnode_property_present(node, "fsl,backplane-mode");
> > +	if (!lynx->backplane_mode)
> > +		return 0;
> > +
> > +	if (fwnode_device_is_compatible(node, "fsl,lx2160a-lynx-pcs"))
> > +		model = MTIP_MODEL_LX2160A;
> > +
> > +	lynx->num_lanes = of_count_phandle_with_args(np, "phys", "#phy-cells");
> > +	if (lynx->num_lanes < 0)
> > +		return lynx->num_lanes;
> 
> Is it possible for ->num_lanes to be zero at this point? If that is
> possible, then ->anlt[PRIMARY_LANE] will be NULL but ->backplane_mode
> will be set, so won't that cause the mtip_* calls above to pass a
> NULL pointer into those functions? Is that safe? Should we trap that
> case here?

Assuming the dt-bindings as proposed here, that case would be an invalid
device tree ("fsl,backplane-mode" present but "phys" isn't present),
which I indeed failed to catch.

But in my reply to Krzysztof here:
https://lore.kernel.org/netdev/20231002121958.xybzovgjzldfiae2@skbuf/

I said that for v3, I'm looking to move the "phys" property from the PCS
to the MAC (it's already in the MAC for the non-backplane use cases).

On LS1028A (ENETC, Felix), the lynx-pcs is not OF-based (we use
lynx_pcs_create_mdiodev()), but it would be good to support the
1000Base-KX link mode there also. As I'm starting to look beyond
LX2160A, I'm starting to see why adding extra dt-bindings to the
lynx-pcs (both "phys" and "fsl,backplane-mode") will be problematic
if there is no OF node to speak of.

I will leave a separate comment with some new ideas.

> If that's correct, then I don't see any point in storing
> ->backplane_mode, since we can then use ->num_lanes > PRIMARY_LANE
> or similar instead.

Well, in v3, my plan is for the caller of lynx_pcs_create() (aka the MAC)
to always pass an array of phys (the ones from its own OF node). In that
case, we would indeed need the "fsl,backplane-mode" property in the PCS,
because otherwise, with your proposal, the PCS would instantiate the
AN/LT block even when it's not expected.

> > +
> > +	if (WARN_ON(lynx->num_lanes > MAX_NUM_LANES))
> > +		return -EINVAL;
> 
> Do we need to use WARN_ON() here, or would it be better to print a short
> error-level message?

Admittedly I may not have the best intuition here, but I didn't want to
over-complicate the code with error messages that can only be triggered
with invalid device trees.

> > +
> > +	for (i = 0; i < lynx->num_lanes; i++) {
> > +		phy = devm_of_phy_get_by_index(dev, np, i);
> > +		if (IS_ERR(phy))
> > +			return dev_err_probe(dev, PTR_ERR(phy),
> > +					     "Failed to get SerDes PHY %d\n", i);
> > +
> > +		lynx->anlt[i] = mtip_backplane_create(mdio, phy, model);
> > +		if (IS_ERR(lynx->anlt[i])) {
> > +			err = PTR_ERR(lynx->anlt[i]);
> > +
> > +			while (i-- > 0)
> > +				mtip_backplane_destroy(lynx->anlt[i]);
> > +
> > +			return err;
> > +		}
> > +	}
> > +
> > +	for (i = 1; i < lynx->num_lanes; i++) {
> > +		err = mtip_backplane_add_subordinate(lynx->anlt[PRIMARY_LANE],
> > +						     lynx->anlt[i]);
> > +		if (WARN_ON(err)) {
> 
> Again, does this need to be a backtrace-producing WARN_ON()?

mtip_backplane_add_subordinate() will only return -ERANGE if called too
many times (more than MTIP_MAX_NUM_SUBORDINATES times, aka more than
"MAX_NUM_LANES - 1" times).

Given the way that the code is constructed, it is technically impossible
for that to happen, but only because MTIP_MAX_NUM_SUBORDINATES is
hand-crafted to be 3 and MAX_NUM_LANES to be 4. I think that if I define
MTIP_MAX_NUM_SUBORDINATES in terms of MAX_NUM_LANES - 1, I can simply
make mtip_backplane_add_subordinate() return void.

What I want to avoid is to add error handling for errors which cannot
take place. Which is where the WARN_ON() came from.

> > +			/* Too many SerDes lanes in the device tree? */
> > +			for (i = 0; i < lynx->num_lanes; i++)
> > +				mtip_backplane_destroy(lynx->anlt[i]);
> > +			return err;
> > +		}
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> >  static struct phylink_pcs *lynx_pcs_create(struct mdio_device *mdio)
> >  {
> >  	struct lynx_pcs *lynx;
> > +	int err;
> >  
> >  	lynx = kzalloc(sizeof(*lynx), GFP_KERNEL);
> >  	if (!lynx)
> > @@ -327,6 +451,12 @@ static struct phylink_pcs *lynx_pcs_create(struct mdio_device *mdio)
> >  	lynx->pcs.neg_mode = true;
> >  	lynx->pcs.poll = true;
> >  
> > +	err = lynx_pcs_parse_fwnode(lynx);
> > +	if (err) {
> > +		kfree(lynx);
> > +		return ERR_PTR(err);
> > +	}
> > +
> >  	return lynx_to_phylink_pcs(lynx);
> >  }
> >  
> > @@ -392,6 +522,11 @@ EXPORT_SYMBOL_GPL(lynx_pcs_create_fwnode);
> >  void lynx_pcs_destroy(struct phylink_pcs *pcs)
> >  {
> >  	struct lynx_pcs *lynx = phylink_pcs_to_lynx(pcs);
> > +	int i;
> > +
> > +	if (lynx->backplane_mode)
> > +		for (i = 0; i < lynx->num_lanes; i++)
> > +			mtip_backplane_destroy(lynx->anlt[i]);
> 
> Won't ->num_lanes only be non-zero when ->backplane_mode is set, so
> isn't the test for ->backplane_mode redundant here?

I think it won't be redundant anymore once the series reaches a less
"WIP" state.

> >  
> >  	mdio_device_put(lynx->mdio);
> >  	kfree(lynx);
> > -- 
> > 2.34.1
> > 
> > 
> 
> -- 
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
Vladimir Oltean Oct. 3, 2023, 9:03 p.m. UTC | #23
On Tue, Oct 03, 2023 at 12:27:56PM +0100, Russell King (Oracle) wrote:
> On Sat, Sep 23, 2023 at 04:48:56PM +0300, Vladimir Oltean wrote:
> > In a future change, we will extend the PHY interface modes for which
> > phylink allows the PCS to handle autoneg. Group the existing occurences
> > into a common phylink_pcs_handles_an().
> 
> I don't see anything wrong with this change, despite my comments on the
> next patch. However, including INTERNAL in this may cause problems with
> DSA. I think maybe these two patches need to be tested on DSA setups
> that make use of INTERNAL.

Prompted by your observation in a different comment, I've searched for
PHY_INTERFACE_MODE_10GKR and found aqr107_read_status(), which sets
phydev->interface to this value based on the MDIO_MMD_PHYXS :
MDIO_PHYXS_VEND_IF_STATUS field.

I am now of the opinion that phy_interface_t is a phylib-only property,
more than anything else, and a slightly obsolete/hard to extend concept,
at that.

The selling point of phylink_pcs is the possibility to have an optional
phylink PHY, so I wouldn't want to reinterpret phy-mode = "internal" and
managed = "in-band-status" to mean "clause 73 autoneg in the phylink_pcs",
because that would quickly clash with the phylib PHY's desire to see the
phy-mode set to something else (it is definitely not "internal").

I don't have access to an AQR107 with firmware provisioning for KR on
the system interface, but presuming the feature isn't entirely bogus,
I shouldn't deliberately close the door for it.

I am exploring the possibility of adding support for 'managed = "c73"'
as another autoneg mode in phylink, among the existing MLO_AN_FIXED,
MLO_AN_PHY and MLO_AN_INBAND. It is quite clear that MLO_AN_INBAND !=
MLO_AN_C73, because the former is the PCS doing the negotiation and the
latter is a dedicated block selecting a technology-specific PCS. So I
think there is room for this extra autoneg mode.

I am also trying to see if there's anything hardwired into phylink to
require a phy-mode when a phylib PHY is absent. I will find that out in
the following days, while working on the v3. I guess for optical SFPs,
it just made sense to reuse the same data structures, to present MAC
drivers the same kind of API as for SFP modules with PHYs. But with C73
autoneg, I don't think it makes as much sense.

If phylink can be made to not require a phy-mode, I would prefer to pass
PHY_INTERFACE_MODE_NA, and for it to be completely ignored for MLO_AN_C73,
both in phylink and in phylink-using drivers.
Florian Fainelli Oct. 4, 2023, 12:08 a.m. UTC | #24
On 9/23/2023 6:48 AM, Vladimir Oltean wrote:
> Allow driver code to print stuff like the resolved link mode to the
> kernel log, by giving it access to the link_mode_names[] ethtool
> internal array which already holds this info.
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>