net: macb: reject unsupported rgmii delays

Message ID	20200616074955.GA9092@laureti-dev
State	Changes Requested
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> Date: Tue, 16 Jun 2020 09:49:56 +0200 From: Helmut Grohne <helmut.grohne@intenta.de> To: Nicolas Ferre <nicolas.ferre@microchip.com>, "David S. Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>, Russell King <linux@armlinux.org.uk>, Palmer Dabbelt <palmer@dabbelt.com>, Paul Walmsley <paul.walmsley@sifive.com>, <netdev@vger.kernel.org> Subject: [PATCH] net: macb: reject unsupported rgmii delays Message-ID: <20200616074955.GA9092@laureti-dev> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline User-Agent: Mutt/1.10.1 (2018-07-13) Sender: netdev-owner@vger.kernel.org Precedence: bulk
Series	net: macb: reject unsupported rgmii delays \| expand net: macb: reject unsupported rgmii delays

Helmut Grohne June 16, 2020, 7:49 a.m. UTC

The macb driver does not support configuring rgmii delays. At least for
the Zynq GEM, delays are not supported by the hardware at all. However,
the driver happily accepts and ignores any such delays.

When operating in a mac to phy connection, the delay setting applies to
the phy. Since the MAC does not support delays, the phy must provide
them and the only supported mode is rgmii-id.  However, in a fixed mac
to mac connection, the delay applies to the mac itself. Therefore the
only supported rgmii mode is rgmii.

Link: https://lore.kernel.org/netdev/20200610081236.GA31659@laureti-dev/
Signed-off-by: Helmut Grohne <helmut.grohne@intenta.de>
---
 drivers/net/ethernet/cadence/macb_main.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Nicolas Ferre June 17, 2020, 9:24 a.m. UTC | #1

On 16/06/2020 at 09:49, Helmut Grohne wrote:
> The macb driver does not support configuring rgmii delays. At least for
> the Zynq GEM, delays are not supported by the hardware at all. However,
> the driver happily accepts and ignores any such delays.
> 
> When operating in a mac to phy connection, the delay setting applies to
> the phy. Since the MAC does not support delays, the phy must provide
> them and the only supported mode is rgmii-id.  However, in a fixed mac
> to mac connection, the delay applies to the mac itself. Therefore the
> only supported rgmii mode is rgmii.
> 
> Link: https://lore.kernel.org/netdev/20200610081236.GA31659@laureti-dev/
> Signed-off-by: Helmut Grohne <helmut.grohne@intenta.de>
> ---
>   drivers/net/ethernet/cadence/macb_main.c | 9 ++++++++-
>   1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
> index 5b9d7c60eebc..bee5bf65e8b3 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -514,7 +514,7 @@ static void macb_validate(struct phylink_config *config,
>              state->interface != PHY_INTERFACE_MODE_RMII &&
>              state->interface != PHY_INTERFACE_MODE_GMII &&
>              state->interface != PHY_INTERFACE_MODE_SGMII &&
> -           !phy_interface_mode_is_rgmii(state->interface)) {
> +           state->interface != PHY_INTERFACE_MODE_RGMII_ID) {

Nitpicking: there's a comment just above, might be interesting to make 
it more precisely matching this change. It mustn't delay addition though.

>                  bitmap_zero(supported, __ETHTOOL_LINK_MODE_MASK_NBITS);
>                  return;
>          }
> @@ -694,6 +694,13 @@ static int macb_phylink_connect(struct macb *bp)
>          struct phy_device *phydev;
>          int ret;
> 
> +       if (of_phy_is_fixed_link(dn) &&
> +           phy_interface_mode_is_rgmii(bp->phy_interface) &&
> +           bp->phy_interface != PHY_INTERFACE_MODE_RGMII) {
> +               netdev_err(dev, "RGMII delays are not supported\n");
> +               return -EINVAL;
> +       }
> +

Otherwise, it looks good to me after reading the associated discussion 
link in your commit message: thanks for that!

Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>

>          if (dn)
>                  ret = phylink_of_phy_connect(bp->phylink, dn, 0);
> 
> --
> 2.20.1
>

Vladimir Oltean June 17, 2020, 9:57 a.m. UTC | #2

Hi Helmut,

On Tue, 16 Jun 2020 at 11:00, Helmut Grohne <helmut.grohne@intenta.de> wrote:
>
> The macb driver does not support configuring rgmii delays. At least for
> the Zynq GEM, delays are not supported by the hardware at all. However,
> the driver happily accepts and ignores any such delays.
>
> When operating in a mac to phy connection, the delay setting applies to
> the phy. Since the MAC does not support delays, the phy must provide
> them and the only supported mode is rgmii-id.  However, in a fixed mac
> to mac connection, the delay applies to the mac itself. Therefore the
> only supported rgmii mode is rgmii.
>
> Link: https://lore.kernel.org/netdev/20200610081236.GA31659@laureti-dev/
> Signed-off-by: Helmut Grohne <helmut.grohne@intenta.de>
> ---
>  drivers/net/ethernet/cadence/macb_main.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
> index 5b9d7c60eebc..bee5bf65e8b3 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -514,7 +514,7 @@ static void macb_validate(struct phylink_config *config,
>             state->interface != PHY_INTERFACE_MODE_RMII &&
>             state->interface != PHY_INTERFACE_MODE_GMII &&
>             state->interface != PHY_INTERFACE_MODE_SGMII &&
> -           !phy_interface_mode_is_rgmii(state->interface)) {
> +           state->interface != PHY_INTERFACE_MODE_RGMII_ID) {

I don't think this change is correct though?
What if there were PCB traces in place, for whatever reason? Then the
driver would need to accept a phy with rgmii-txid, rgmii-rxid or
rgmii.

>                 bitmap_zero(supported, __ETHTOOL_LINK_MODE_MASK_NBITS);
>                 return;
>         }
> @@ -694,6 +694,13 @@ static int macb_phylink_connect(struct macb *bp)
>         struct phy_device *phydev;
>         int ret;
>
> +       if (of_phy_is_fixed_link(dn) &&
> +           phy_interface_mode_is_rgmii(bp->phy_interface) &&
> +           bp->phy_interface != PHY_INTERFACE_MODE_RGMII) {
> +               netdev_err(dev, "RGMII delays are not supported\n");
> +               return -EINVAL;
> +       }
> +

Have you checked that this doesn't break any existing in-tree users?

>         if (dn)
>                 ret = phylink_of_phy_connect(bp->phylink, dn, 0);
>
> --
> 2.20.1
>

Thanks,
-Vladimir

Russell King (Oracle) June 17, 2020, 10:55 a.m. UTC | #3

On Tue, Jun 16, 2020 at 09:49:56AM +0200, Helmut Grohne wrote:
> The macb driver does not support configuring rgmii delays. At least for
> the Zynq GEM, delays are not supported by the hardware at all. However,
> the driver happily accepts and ignores any such delays.
> 
> When operating in a mac to phy connection, the delay setting applies to
> the phy. Since the MAC does not support delays, the phy must provide
> them and the only supported mode is rgmii-id.  However, in a fixed mac
> to mac connection, the delay applies to the mac itself. Therefore the
> only supported rgmii mode is rgmii.

This seems incorrect - see the phy documentation in
Documentation/networking/phy.rst:

* PHY_INTERFACE_MODE_RGMII: the PHY is not responsible for inserting any
  internal delay by itself, it assumes that either the Ethernet MAC (if capable
  or the PCB traces) insert the correct 1.5-2ns delay

* PHY_INTERFACE_MODE_RGMII_TXID: the PHY should insert an internal delay
  for the transmit data lines (TXD[3:0]) processed by the PHY device

* PHY_INTERFACE_MODE_RGMII_RXID: the PHY should insert an internal delay
  for the receive data lines (RXD[3:0]) processed by the PHY device

* PHY_INTERFACE_MODE_RGMII_ID: the PHY should insert internal delays for
  both transmit AND receive data lines from/to the PHY device

Note that PHY_INTERFACE_MODE_RGMII, the delay can be added by _either_
the MAC or by PCB trace routing.

The individual RGMII delay modes are more about what the PHY itself is
asked to do with respect to inserting delays, so I don't think your
patch makes sense.

In any case...

> Link: https://lore.kernel.org/netdev/20200610081236.GA31659@laureti-dev/
> Signed-off-by: Helmut Grohne <helmut.grohne@intenta.de>
> ---
>  drivers/net/ethernet/cadence/macb_main.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
> index 5b9d7c60eebc..bee5bf65e8b3 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -514,7 +514,7 @@ static void macb_validate(struct phylink_config *config,
>  	    state->interface != PHY_INTERFACE_MODE_RMII &&
>  	    state->interface != PHY_INTERFACE_MODE_GMII &&
>  	    state->interface != PHY_INTERFACE_MODE_SGMII &&
> -	    !phy_interface_mode_is_rgmii(state->interface)) {
> +	    state->interface != PHY_INTERFACE_MODE_RGMII_ID) {

Here you reject everything except PHY_INTERFACE_MODE_RGMII_ID.

>  		bitmap_zero(supported, __ETHTOOL_LINK_MODE_MASK_NBITS);
>  		return;
>  	}
> @@ -694,6 +694,13 @@ static int macb_phylink_connect(struct macb *bp)
>  	struct phy_device *phydev;
>  	int ret;
>  
> +	if (of_phy_is_fixed_link(dn) &&
> +	    phy_interface_mode_is_rgmii(bp->phy_interface) &&
> +	    bp->phy_interface != PHY_INTERFACE_MODE_RGMII) {

but here you reject everything except PHY_INTERFACE_MODE_RGMII.  These
can't both be right.  If you start with PHY_INTERFACE_MODE_RGMII, and
have a fixed link, you'll have PHY_INTERFACE_MODE_RGMII passed into
the validate function, which will then fail.

Helmut Grohne June 17, 2020, 11:15 a.m. UTC | #4

Hi Vladimir,

On Wed, Jun 17, 2020 at 11:57:23AM +0200, Vladimir Oltean wrote:
> On Tue, 16 Jun 2020 at 11:00, Helmut Grohne <helmut.grohne@intenta.de> wrote:
> > --- a/drivers/net/ethernet/cadence/macb_main.c
> > +++ b/drivers/net/ethernet/cadence/macb_main.c
> > @@ -514,7 +514,7 @@ static void macb_validate(struct phylink_config *config,
> >             state->interface != PHY_INTERFACE_MODE_RMII &&
> >             state->interface != PHY_INTERFACE_MODE_GMII &&
> >             state->interface != PHY_INTERFACE_MODE_SGMII &&
> > -           !phy_interface_mode_is_rgmii(state->interface)) {
> > +           state->interface != PHY_INTERFACE_MODE_RGMII_ID) {
> 
> I don't think this change is correct though?
> What if there were PCB traces in place, for whatever reason? Then the
> driver would need to accept a phy with rgmii-txid, rgmii-rxid or
> rgmii.

I must confess that after rereading the discussion and the other
discussions at
https://patchwork.ozlabs.org/project/netdev/patch/20190410005700.31582-19-olteanv@gmail.com/
and
https://patchwork.ozlabs.org/project/netdev/patch/20190413012822.30931-21-olteanv@gmail.com/,
this is less and less clear to me.

I think we can agree that the current definition of phy-mode is
confusing when it comes to rgmii delays. The documentation doesn't even
mention the possibility of using serpentines.

Your interpretation seems to be that this value is solely meant for the
PHY to operate on and that the MAC should not act upon it (at least the
delay part). That interpetation is consistent with previous discussions
and mostly consistent with the documentation. The phy-mode property is
documented in ethernet-controller.yaml, which suggests that it should
apply to the MAC somehow.

Following this interpretation, I think it would be good to also document
it in ethernet-phy.yaml.

Thank you for the review. I agree that the hunk should be dropped.

However, in the fixed-link case (below) the interpretation regarding
serpentines seems to be well-defined: If serpentines are present, both
MACs should specify rgmii.

> >                 bitmap_zero(supported, __ETHTOOL_LINK_MODE_MASK_NBITS);
> >                 return;
> >         }
> > @@ -694,6 +694,13 @@ static int macb_phylink_connect(struct macb *bp)
> >         struct phy_device *phydev;
> >         int ret;
> >
> > +       if (of_phy_is_fixed_link(dn) &&
> > +           phy_interface_mode_is_rgmii(bp->phy_interface) &&
> > +           bp->phy_interface != PHY_INTERFACE_MODE_RGMII) {
> > +               netdev_err(dev, "RGMII delays are not supported\n");
> > +               return -EINVAL;
> > +       }
> > +
> 
> Have you checked that this doesn't break any existing in-tree users?

It seems like all the in-tree users of this driver that do specify a
phy-mode, specify rmii.

If possible breakage is to be minimized, I'd be happy with using
netdev_warn instead. The major benefit here is getting a clear
indication of why things don't work. My emphasis is on getting something
to dmesg, not to make things fail.

So what should we prefer here? Failure or warning?

In the long run, it would be nice to refactor the way to denote delays
in a way that is consistently defined for MAC to PHY and MAC to MAC
connections while accounting for serpentines.

Helmut

Helmut Grohne June 17, 2020, 11:21 a.m. UTC | #5

On Wed, Jun 17, 2020 at 12:55:18PM +0200, Russell King - ARM Linux admin wrote:
> The individual RGMII delay modes are more about what the PHY itself is
> asked to do with respect to inserting delays, so I don't think your
> patch makes sense.

This seems to be the same aspect that Vladimir Oltean remarked. I agree
that the relevant hunk should be dropped.

> > --- a/drivers/net/ethernet/cadence/macb_main.c
> > +++ b/drivers/net/ethernet/cadence/macb_main.c
> > @@ -514,7 +514,7 @@ static void macb_validate(struct phylink_config *config,
> >  	    state->interface != PHY_INTERFACE_MODE_RMII &&
> >  	    state->interface != PHY_INTERFACE_MODE_GMII &&
> >  	    state->interface != PHY_INTERFACE_MODE_SGMII &&
> > -	    !phy_interface_mode_is_rgmii(state->interface)) {
> > +	    state->interface != PHY_INTERFACE_MODE_RGMII_ID) {
> 
> Here you reject everything except PHY_INTERFACE_MODE_RGMII_ID.
> 
> >  		bitmap_zero(supported, __ETHTOOL_LINK_MODE_MASK_NBITS);
> >  		return;
> >  	}
> > @@ -694,6 +694,13 @@ static int macb_phylink_connect(struct macb *bp)
> >  	struct phy_device *phydev;
> >  	int ret;
> >  
> > +	if (of_phy_is_fixed_link(dn) &&
> > +	    phy_interface_mode_is_rgmii(bp->phy_interface) &&
> > +	    bp->phy_interface != PHY_INTERFACE_MODE_RGMII) {
> 
> but here you reject everything except PHY_INTERFACE_MODE_RGMII.  These
> can't both be right.  If you start with PHY_INTERFACE_MODE_RGMII, and
> have a fixed link, you'll have PHY_INTERFACE_MODE_RGMII passed into
> the validate function, which will then fail.

For a fixed-link, the validation function is never called. Therefore, it
cannot reject PHY_INTERFACE_MODE_RGMII. It works in practice.

However, the consensus is to not reject that mode in the validation
function.

Helmut

Vladimir Oltean June 17, 2020, 11:32 a.m. UTC | #6

On Wed, 17 Jun 2020 at 13:56, Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>
> On Tue, Jun 16, 2020 at 09:49:56AM +0200, Helmut Grohne wrote:
> > The macb driver does not support configuring rgmii delays. At least for
> > the Zynq GEM, delays are not supported by the hardware at all. However,
> > the driver happily accepts and ignores any such delays.
> >
> > When operating in a mac to phy connection, the delay setting applies to
> > the phy. Since the MAC does not support delays, the phy must provide
> > them and the only supported mode is rgmii-id.  However, in a fixed mac
> > to mac connection, the delay applies to the mac itself. Therefore the
> > only supported rgmii mode is rgmii.
>
> This seems incorrect - see the phy documentation in
> Documentation/networking/phy.rst:
>
> * PHY_INTERFACE_MODE_RGMII: the PHY is not responsible for inserting any
>   internal delay by itself, it assumes that either the Ethernet MAC (if capable
>   or the PCB traces) insert the correct 1.5-2ns delay
>
> * PHY_INTERFACE_MODE_RGMII_TXID: the PHY should insert an internal delay
>   for the transmit data lines (TXD[3:0]) processed by the PHY device
>
> * PHY_INTERFACE_MODE_RGMII_RXID: the PHY should insert an internal delay
>   for the receive data lines (RXD[3:0]) processed by the PHY device
>
> * PHY_INTERFACE_MODE_RGMII_ID: the PHY should insert internal delays for
>   both transmit AND receive data lines from/to the PHY device
>
> Note that PHY_INTERFACE_MODE_RGMII, the delay can be added by _either_
> the MAC or by PCB trace routing.
>

What does it mean "can" be added? Is it or is it not added? As a MAC
driver, what do you do?

> The individual RGMII delay modes are more about what the PHY itself is
> asked to do with respect to inserting delays, so I don't think your
> patch makes sense.
>

We all read the phy-mode documentation, but we aren't really any
smarter. That document completely fails to address the existence of
PCB traces.
Helmut's link points to some more discussion around this topic.

> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

Russell King (Oracle) June 17, 2020, 11:34 a.m. UTC | #7

On Wed, Jun 17, 2020 at 02:32:09PM +0300, Vladimir Oltean wrote:
> On Wed, 17 Jun 2020 at 13:56, Russell King - ARM Linux admin
> <linux@armlinux.org.uk> wrote:
> >
> > On Tue, Jun 16, 2020 at 09:49:56AM +0200, Helmut Grohne wrote:
> > > The macb driver does not support configuring rgmii delays. At least for
> > > the Zynq GEM, delays are not supported by the hardware at all. However,
> > > the driver happily accepts and ignores any such delays.
> > >
> > > When operating in a mac to phy connection, the delay setting applies to
> > > the phy. Since the MAC does not support delays, the phy must provide
> > > them and the only supported mode is rgmii-id.  However, in a fixed mac
> > > to mac connection, the delay applies to the mac itself. Therefore the
> > > only supported rgmii mode is rgmii.
> >
> > This seems incorrect - see the phy documentation in
> > Documentation/networking/phy.rst:
> >
> > * PHY_INTERFACE_MODE_RGMII: the PHY is not responsible for inserting any
> >   internal delay by itself, it assumes that either the Ethernet MAC (if capable
> >   or the PCB traces) insert the correct 1.5-2ns delay
> >
> > * PHY_INTERFACE_MODE_RGMII_TXID: the PHY should insert an internal delay
> >   for the transmit data lines (TXD[3:0]) processed by the PHY device
> >
> > * PHY_INTERFACE_MODE_RGMII_RXID: the PHY should insert an internal delay
> >   for the receive data lines (RXD[3:0]) processed by the PHY device
> >
> > * PHY_INTERFACE_MODE_RGMII_ID: the PHY should insert internal delays for
> >   both transmit AND receive data lines from/to the PHY device
> >
> > Note that PHY_INTERFACE_MODE_RGMII, the delay can be added by _either_
> > the MAC or by PCB trace routing.
> >
> 
> What does it mean "can" be added? Is it or is it not added? As a MAC
> driver, what do you do?

I'm just stating what is documented.

> > The individual RGMII delay modes are more about what the PHY itself is
> > asked to do with respect to inserting delays, so I don't think your
> > patch makes sense.
> >
> 
> We all read the phy-mode documentation, but we aren't really any
> smarter. That document completely fails to address the existence of
> PCB traces.
> Helmut's link points to some more discussion around this topic.

Why are you so abrasive?

Not responding to this until you start behaving more reasonably.

Vladimir Oltean June 17, 2020, 11:38 a.m. UTC | #8

On Wed, 17 Jun 2020 at 14:34, Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>

>
> Why are you so abrasive?
>
> Not responding to this until you start behaving more reasonably.
>
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

Sorry.
What I meant to say is: the documentation is unclear. It has been
interpreted in conflicting ways, where the strict interpretations have
proven to have practical limitations, and the practical
interpretations have been accused of being incorrect. In my opinion
there is no way to fix it without introducing new bindings, which I am
not sure is really worth doing.

Thanks,
-Vladimir

Russell King (Oracle) June 17, 2020, 11:40 a.m. UTC | #9

On Wed, Jun 17, 2020 at 01:21:53PM +0200, Helmut Grohne wrote:
> On Wed, Jun 17, 2020 at 12:55:18PM +0200, Russell King - ARM Linux admin wrote:
> > The individual RGMII delay modes are more about what the PHY itself is
> > asked to do with respect to inserting delays, so I don't think your
> > patch makes sense.
> 
> This seems to be the same aspect that Vladimir Oltean remarked. I agree
> that the relevant hunk should be dropped.
> 
> > > --- a/drivers/net/ethernet/cadence/macb_main.c
> > > +++ b/drivers/net/ethernet/cadence/macb_main.c
> > > @@ -514,7 +514,7 @@ static void macb_validate(struct phylink_config *config,
> > >  	    state->interface != PHY_INTERFACE_MODE_RMII &&
> > >  	    state->interface != PHY_INTERFACE_MODE_GMII &&
> > >  	    state->interface != PHY_INTERFACE_MODE_SGMII &&
> > > -	    !phy_interface_mode_is_rgmii(state->interface)) {
> > > +	    state->interface != PHY_INTERFACE_MODE_RGMII_ID) {
> > 
> > Here you reject everything except PHY_INTERFACE_MODE_RGMII_ID.
> > 
> > >  		bitmap_zero(supported, __ETHTOOL_LINK_MODE_MASK_NBITS);
> > >  		return;
> > >  	}
> > > @@ -694,6 +694,13 @@ static int macb_phylink_connect(struct macb *bp)
> > >  	struct phy_device *phydev;
> > >  	int ret;
> > >  
> > > +	if (of_phy_is_fixed_link(dn) &&
> > > +	    phy_interface_mode_is_rgmii(bp->phy_interface) &&
> > > +	    bp->phy_interface != PHY_INTERFACE_MODE_RGMII) {
> > 
> > but here you reject everything except PHY_INTERFACE_MODE_RGMII.  These
> > can't both be right.  If you start with PHY_INTERFACE_MODE_RGMII, and
> > have a fixed link, you'll have PHY_INTERFACE_MODE_RGMII passed into
> > the validate function, which will then fail.
> 
> For a fixed-link, the validation function is never called. Therefore, it
> cannot reject PHY_INTERFACE_MODE_RGMII. It works in practice.

Hmm, I'm not so sure, but then I don't know exactly what code you're
using.  Looking at mainline, even for a fixed link, you call
phylink_create().  phylink_create() will spot the fixed link, and
parse the description, calling the validation function.  If that
fails, it will generate a warning at that point:

  "fixed link %s duplex %dMbps not recognised"

It doesn't cause an operational failure, but it means that you end up
with a zero supported mask, which is likely not expected.

This is not an expected situation, so I'll modify your claim to "it
works but issues a warning" which still means that it's not correct.

Helmut Grohne June 17, 2020, 11:52 a.m. UTC | #10

On Wed, Jun 17, 2020 at 01:40:25PM +0200, Russell King - ARM Linux admin wrote:
> > For a fixed-link, the validation function is never called. Therefore, it
> > cannot reject PHY_INTERFACE_MODE_RGMII. It works in practice.
> 
> Hmm, I'm not so sure, but then I don't know exactly what code you're
> using.  Looking at mainline, even for a fixed link, you call
> phylink_create().  phylink_create() will spot the fixed link, and
> parse the description, calling the validation function.  If that
> fails, it will generate a warning at that point:
> 
>   "fixed link %s duplex %dMbps not recognised"
> 
> It doesn't cause an operational failure, but it means that you end up
> with a zero supported mask, which is likely not expected.
> 
> This is not an expected situation, so I'll modify your claim to "it
> works but issues a warning" which still means that it's not correct.

I do see that warning. I agree with your correction of my claim. Thank
you for your attention to detail.

So we have two good reasons for not rejecting delay configuration in the
validation function now.

The remaining open question seems to be whether configuring a delay on a
MAC to MAC connection should cause a failure or a only warning. Do you
have an opinion on that?

All in-tree bindings of the driver seem to use rmii when they specify a
phy-mode.

Helmut

Russell King (Oracle) June 17, 2020, 11:57 a.m. UTC | #11

On Wed, Jun 17, 2020 at 02:38:25PM +0300, Vladimir Oltean wrote:
> On Wed, 17 Jun 2020 at 14:34, Russell King - ARM Linux admin
> <linux@armlinux.org.uk> wrote:
> >
> 
> >
> > Why are you so abrasive?
> >
> > Not responding to this until you start behaving more reasonably.
> >
> > --
> > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> > FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
> 
> Sorry.
> What I meant to say is: the documentation is unclear. It has been
> interpreted in conflicting ways, where the strict interpretations have
> proven to have practical limitations, and the practical
> interpretations have been accused of being incorrect. In my opinion
> there is no way to fix it without introducing new bindings, which I am
> not sure is really worth doing.

The documentation was added in 2016, many years after we have had users
of this, in an attempt to clear up some of the confusion.  It is likely
that it had to cater for existing users though - I'm sure if Florian
cares, he can comment on that.

It would be better if it made a definitive statement about it, but doing
so would likely attract pedants to try to fix everything to conform,
causing breakage in the process.

I've recently had a painful experience of this with the Atheros PHYs,
where there are lots of platforms using "rgmii" when they should have
been using "rgmii-id".  A patch changed this in the Atheros code breaking
all these platforms, breakage which persisted over several kernel
versions, needing fixes to DT files that then had to be back-ported.
That's fine if you know what happened to break it, but if you don't, and
you don't know what the fix is, you're mostly stuffed and stuck with non-
working ethernet.  That really was not nice.

This is one of the reasons why I press for any new PHY interface mode
to be documented in the phylib documentation right from the start, so
that hopefully we can avoid this kind of thing in the future.

Russell King (Oracle) June 17, 2020, 12:08 p.m. UTC | #12

On Wed, Jun 17, 2020 at 01:52:01PM +0200, Helmut Grohne wrote:
> On Wed, Jun 17, 2020 at 01:40:25PM +0200, Russell King - ARM Linux admin wrote:
> > > For a fixed-link, the validation function is never called. Therefore, it
> > > cannot reject PHY_INTERFACE_MODE_RGMII. It works in practice.
> > 
> > Hmm, I'm not so sure, but then I don't know exactly what code you're
> > using.  Looking at mainline, even for a fixed link, you call
> > phylink_create().  phylink_create() will spot the fixed link, and
> > parse the description, calling the validation function.  If that
> > fails, it will generate a warning at that point:
> > 
> >   "fixed link %s duplex %dMbps not recognised"
> > 
> > It doesn't cause an operational failure, but it means that you end up
> > with a zero supported mask, which is likely not expected.
> > 
> > This is not an expected situation, so I'll modify your claim to "it
> > works but issues a warning" which still means that it's not correct.
> 
> I do see that warning. I agree with your correction of my claim. Thank
> you for your attention to detail.
> 
> So we have two good reasons for not rejecting delay configuration in the
> validation function now.
> 
> The remaining open question seems to be whether configuring a delay on a
> MAC to MAC connection should cause a failure or a only warning. Do you
> have an opinion on that?
> 
> All in-tree bindings of the driver seem to use rmii when they specify a
> phy-mode.

This brings up a problem in itself - the phy interface mode is
currently defined in terms of a MAC-to-PHY setup, not a MAC-to-MAC
setup.

With a fixed link, we could be in either a MAC-to-PHY or MAC-to-MAC
setup; we just don't know.  However, we don't have is access to the
PHY (if it exists) in the fixed link case to configure it for the
delay.

In the MAC-to-MAC RGMII setup, where neither MAC can insert the
necessary delay, the only way to have a RGMII conformant link is to
have the PCB traces induce the necessary delay. That errs towards
PHY_INTERFACE_MODE_RGMII for this case.

However, considering the MAC-to-PHY RGMII fixed link case, where the
PHY may not be accessible, and may be configured with the necessary
delay, should that case also use PHY_INTERFACE_MODE_RGMII - clearly
that would be as wrong as using PHY_INTERFACE_MODE_RGMII_ID would
be for the MAC-to-MAC RGMII with PCB-delays case.

So, I think a MAC driver should not care about the specific RGMII
mode being asked for in any case, and just accept them all.

I also think that some of this ought to be put in the documentation
as guidance for new implementations.

Helmut Grohne June 18, 2020, 8:14 a.m. UTC | #13

On Wed, Jun 17, 2020 at 02:08:09PM +0200, Russell King - ARM Linux admin wrote:
> With a fixed link, we could be in either a MAC-to-PHY or MAC-to-MAC
> setup; we just don't know.  However, we don't have is access to the
> PHY (if it exists) in the fixed link case to configure it for the
> delay.

Let me twist that a little: We may have access to the PHY, but we don't
always have access. When we do have access, we have a separate device
tree node with another fixed-link and another phy-mode. For fixed-links,
we specify the phy-mode for each end.

> In the MAC-to-MAC RGMII setup, where neither MAC can insert the
> necessary delay, the only way to have a RGMII conformant link is to
> have the PCB traces induce the necessary delay. That errs towards
> PHY_INTERFACE_MODE_RGMII for this case.

Yes.

> However, considering the MAC-to-PHY RGMII fixed link case, where the
> PHY may not be accessible, and may be configured with the necessary
> delay, should that case also use PHY_INTERFACE_MODE_RGMII - clearly
> that would be as wrong as using PHY_INTERFACE_MODE_RGMII_ID would
> be for the MAC-to-MAC RGMII with PCB-delays case.

If you take into account that the PHY has a separate node with phy-mode
being rgmii-id, it makes a lot more sense to use rgmii for the phy-mode
of the MAC. So I don't think it is that clear that doing so is wrong.

In an earlier discussion Florian Fainelli said:
https://patchwork.ozlabs.org/project/netdev/patch/20190410005700.31582-19-olteanv@gmail.com/#2149183
| fixed-link really should denote a MAC to MAC connection so if you have
| "rgmii-id" on one side, you would expect "rgmii" on the other side
| (unless PCB traces account for delays, grrr).

For these reasons, I still think that rgmii would be a useful
description for the fixed-link to PHY connection where the PHY adds the
delay.

> So, I think a MAC driver should not care about the specific RGMII
> mode being asked for in any case, and just accept them all.

I would like to agree to this. However, the implication is that when you
get your delays wrong, your driver silently ignores you and you never
notice your mistake until you see no traffic passing and wonder why.

In this case, I was faced with a PHY that would do rgmii-txid and I
configured that on the MAC. Unfortunately, macb_main.c didn't tell me
that it did rgmii-id instead.

> I also think that some of this ought to be put in the documentation
> as guidance for new implementations.

That seems to be the part where everyone agrees.

Given the state of the discussion, I'm wondering whether this could be
fixed at a more fundamental level in the device tree bindings.

A number of people (including you) pointed out that retroactively fixing
the meaning of phy modes does not work and causes pain instead. That
hints that the only way to fix this is adding new properties. How about
this?

rgmii-delay-type:
  description:
    Responsibility for adding the rgmii delay
  enum:
    # The remote PHY or MAC to this MAC is responsible for adding the
    # delay.
    - remote
    # The delay is added by neither MAC nor MAC, but using PCB traces
    # instead.
    - pcb
    # The MAC must add the delay.
    - local
rgmii-rx-delay:
  # Responsibility for RX delay. Delay specification in the phy-mode is
  # ignored when this is present.
  $ref: "#/properties/rgmii-delay-type"
rgmii-tx-delay:
  # Responsibility for TX delay. Delay specification in the phy-mode is
  # ignored when this is present.
  $ref: "#/properties/rgmii-delay-type"

The naming is up to discussion, but I think you get the idea. The core
properties of this proposal are:
 * It does not break existing device trees.
 * It completely resolves the present ambiguity.
The major downside is that you never know whether your driver supports
such delay specifications already.

Helmut

Russell King (Oracle) June 18, 2020, 8:45 a.m. UTC | #14

On Thu, Jun 18, 2020 at 10:14:33AM +0200, Helmut Grohne wrote:
> On Wed, Jun 17, 2020 at 02:08:09PM +0200, Russell King - ARM Linux admin wrote:
> > With a fixed link, we could be in either a MAC-to-PHY or MAC-to-MAC
> > setup; we just don't know.  However, we don't have is access to the
> > PHY (if it exists) in the fixed link case to configure it for the
> > delay.
> 
> Let me twist that a little: We may have access to the PHY, but we don't
> always have access. When we do have access, we have a separate device
> tree node with another fixed-link and another phy-mode. For fixed-links,
> we specify the phy-mode for each end.

If you have access to the PHY, you shouldn't be using fixed-link. In
any case, there is no support for a fixed-link specification at the
PHY end in the kernel.  The PHY binding doc does not allow for this
either.

> > In the MAC-to-MAC RGMII setup, where neither MAC can insert the
> > necessary delay, the only way to have a RGMII conformant link is to
> > have the PCB traces induce the necessary delay. That errs towards
> > PHY_INTERFACE_MODE_RGMII for this case.
> 
> Yes.
> 
> > However, considering the MAC-to-PHY RGMII fixed link case, where the
> > PHY may not be accessible, and may be configured with the necessary
> > delay, should that case also use PHY_INTERFACE_MODE_RGMII - clearly
> > that would be as wrong as using PHY_INTERFACE_MODE_RGMII_ID would
> > be for the MAC-to-MAC RGMII with PCB-delays case.
> 
> If you take into account that the PHY has a separate node with phy-mode
> being rgmii-id, it makes a lot more sense to use rgmii for the phy-mode
> of the MAC. So I don't think it is that clear that doing so is wrong.

The PHY binding document does not allow for this, neither does the
kernel.

> In an earlier discussion Florian Fainelli said:
> https://patchwork.ozlabs.org/project/netdev/patch/20190410005700.31582-19-olteanv@gmail.com/#2149183
> | fixed-link really should denote a MAC to MAC connection so if you have
> | "rgmii-id" on one side, you would expect "rgmii" on the other side
> | (unless PCB traces account for delays, grrr).
> 
> For these reasons, I still think that rgmii would be a useful
> description for the fixed-link to PHY connection where the PHY adds the
> delay.

I think Florian is wrong; consider what it means when you have a
fixed-link connection between a MAC and an inaccessible PHY and
the link as a whole is operating in what we would call "rgmii-id"
mode if the PHY were accessible.

Taking Florian's stance, it basically means that DT no longer
describes the hardware, but how we have chosen to interpret the
properties in _one_ specific case in a completely different way
to all the other cases.

> > So, I think a MAC driver should not care about the specific RGMII
> > mode being asked for in any case, and just accept them all.
> 
> I would like to agree to this. However, the implication is that when you
> get your delays wrong, your driver silently ignores you and you never
> notice your mistake until you see no traffic passing and wonder why.

So explain to me this aspect of your reasoning:

- If the link is operating in non-fixed-link mode, the rgmii* PHY modes
  describe the delay to be applied at the PHY end.
- If the link is operating in fixed-link mode, the rgmii* PHY modes
  describe the delay to be applied at the MAC end.

That doesn't make sense, and excludes properly describing a MAC-to-
inaccessible-PHY setup.

It also means that we're having to conditionalise how we deal with
this PHY mode in every single driver, which means that every single
driver is going to end up interpreting it differently, and it's going
to become a buggy mess.

> In this case, I was faced with a PHY that would do rgmii-txid and I
> configured that on the MAC. Unfortunately, macb_main.c didn't tell me
> that it did rgmii-id instead.

The documentation makes it clear that "rgmii-*" (note the hyphen) are
to be applied by the PHY *only*, and not by the MAC.

> > I also think that some of this ought to be put in the documentation
> > as guidance for new implementations.
> 
> That seems to be the part where everyone agrees.
> 
> Given the state of the discussion, I'm wondering whether this could be
> fixed at a more fundamental level in the device tree bindings.
> 
> A number of people (including you) pointed out that retroactively fixing
> the meaning of phy modes does not work and causes pain instead. That
> hints that the only way to fix this is adding new properties. How about
> this?
> 
> rgmii-delay-type:
>   description:
>     Responsibility for adding the rgmii delay
>   enum:
>     # The remote PHY or MAC to this MAC is responsible for adding the
>     # delay.
>     - remote
>     # The delay is added by neither MAC nor MAC, but using PCB traces
>     # instead.
>     - pcb
>     # The MAC must add the delay.
>     - local

Why do we need that complexity?  If we decide that we can allow
phy-mode = "rgmii" and introduce new properties to control the
delay, then we just need:

  rgmii-tx-delay-ps = <nnn>;
  rgmii-rx-delay-ps = <nnn>;

specified in the MAC node (to be applied only at the MAC end) or
specified in the PHY node (to be applied only at the PHY end.)
In the normal case, this would be the standard delay value, but
in exceptional cases where supported, the delays can be arbitary.
We know there are PHYs out there which allow other delays.

This means each end is responsible for parsing these properties in
its own node and applying them - or raising an error if they can't
be supported.

With your "rgmii-delay-type" idea, if this is only specified at the
MAC end, then the PHY code somehow needs to know what that setting is,
which adds a lot of complexity - the PHY code has to go digging for
the MAC node, we may even have to introduce a back-reference from the
PHY node to the MAC node so the PHY can find it.  There are MAC drivers
out there where there is one struct device, but multiple net devices,
so digging inside struct net_device to get at the parent struct device,
and then trying to parse "rgmii-delay-type" from the of-node won't work.

Helmut Grohne June 18, 2020, 9:05 a.m. UTC | #15

On Thu, Jun 18, 2020 at 10:45:54AM +0200, Russell King - ARM Linux admin wrote:
> Why do we need that complexity?  If we decide that we can allow
> phy-mode = "rgmii" and introduce new properties to control the
> delay, then we just need:
> 
>   rgmii-tx-delay-ps = <nnn>;
>   rgmii-rx-delay-ps = <nnn>;
> 
> specified in the MAC node (to be applied only at the MAC end) or
> specified in the PHY node (to be applied only at the PHY end.)
> In the normal case, this would be the standard delay value, but
> in exceptional cases where supported, the delays can be arbitary.
> We know there are PHYs out there which allow other delays.
> 
> This means each end is responsible for parsing these properties in
> its own node and applying them - or raising an error if they can't
> be supported.

Thank you. That makes a lot more sense while keeping the (imo) important
properties of my proposal:
 * It is backwards compatible. These properties override delays
   specified inside phy-mode. Otherwise the vague phy-mode meaning is
   retained.
 * The ambiguity is resolved. It is always clear where delays should be
   configure and the properties properly account for possible PCB
   traces.

It also resolves my original problem. If support for these properties is
added to macb_main.c, it would simply check that both delays are 0 as
internal delays are not supported by the hardware.  When I would have
attempted to configure an rx delay, it would have nicely errored out.

How can we achieve wider consensus on this and put it into the dt
specification? Should there be drivers supporting these first?

Helmut

Russell King (Oracle) June 18, 2020, 10:01 a.m. UTC | #16

On Thu, Jun 18, 2020 at 11:05:26AM +0200, Helmut Grohne wrote:
> On Thu, Jun 18, 2020 at 10:45:54AM +0200, Russell King - ARM Linux admin wrote:
> > Why do we need that complexity?  If we decide that we can allow
> > phy-mode = "rgmii" and introduce new properties to control the
> > delay, then we just need:
> > 
> >   rgmii-tx-delay-ps = <nnn>;
> >   rgmii-rx-delay-ps = <nnn>;
> > 
> > specified in the MAC node (to be applied only at the MAC end) or
> > specified in the PHY node (to be applied only at the PHY end.)
> > In the normal case, this would be the standard delay value, but
> > in exceptional cases where supported, the delays can be arbitary.
> > We know there are PHYs out there which allow other delays.
> > 
> > This means each end is responsible for parsing these properties in
> > its own node and applying them - or raising an error if they can't
> > be supported.
> 
> Thank you. That makes a lot more sense while keeping the (imo) important
> properties of my proposal:
>  * It is backwards compatible. These properties override delays
>    specified inside phy-mode. Otherwise the vague phy-mode meaning is
>    retained.
>  * The ambiguity is resolved. It is always clear where delays should be
>    configure and the properties properly account for possible PCB
>    traces.
> 
> It also resolves my original problem. If support for these properties is
> added to macb_main.c, it would simply check that both delays are 0 as
> internal delays are not supported by the hardware.  When I would have
> attempted to configure an rx delay, it would have nicely errored out.

I think we'd want a helper or two to do the parsing and return the
delays, something like:

#define PHY_RGMII_DELAY_PS_NONE	0
#define PHY_RGMII_DELAY_PS_STD	1500

/* @np here should be the MAC node */
int of_mac_get_delays(struct device_node *np,
		      phy_interface_mode interface,
		      u32 *tx_delay_ps, u32 *rx_delay_ps)
{
	*tx_delay_ps = PHY_RGMII_DELAY_PS_NONE;
	*rx_delay_ps = PHY_RGMII_DELAY_PS_NONE;

	if (!np)
		return 0;

	if (interface == PHY_INTERFACE_MODE_RGMII) {
		of_property_read_u32(np, "rgmii-tx-delay-ps", tx_delay_ps);
		of_property_read_u32(np, "rgmii-rx-delay-ps", rx_delay_ps);
	}

	return 0;
}

/* @np here should be the PHY node */
int of_phy_get_delays(struct device_node *np,
		      phy_interface_mode interface,
		      u32 *tx_delay_ps, u32 *rx_delay_ps)
{
	switch (interface) {
	case PHY_INTERFACE_MODE_RGMII_ID:
		*tx_delay_ps = PHY_RGMII_DELAY_PS_STD;
		*rx_delay_ps = PHY_RGMII_DELAY_PS_STD;
		return 0;

	case PHY_INTERFACE_MODE_RGMII_RXID:
		*tx_delay_ps = PHY_RGMII_DELAY_PS_NONE;
		*rx_delay_ps = PHY_RGMII_DELAY_PS_STD;
		return 0;

	case PHY_INTERFACE_MODE_RGMII_TXID:
		*tx_delay_ps = PHY_RGMII_DELAY_PS_STD;
		*rx_delay_ps = PHY_RGMII_DELAY_PS_NONE;
		return 0;

	default:
		return of_mac_get_delays(np, interface,
					 tx_delay_ps, rx_delay_ps);
	}
}

as a first cut - validation left up to the user of these.  At least
we then have an easy interface for PHY drivers to use, for example:

static int m88e1121_config_aneg_rgmii_delays(struct phy_device *phydev)
{
	u32 tx_delay_ps, rx_delay_ps;
	int err;

	err = of_phy_get_delays(phydev->mdio.dev.of_node,
				phydev->interface, &tx_delay_ps,
				&rx_delay_ps);
	if (err)
		return err;

	mscr = 0;
	if (tx_delay_ps == PHY_RGMII_DELAY_PS_STD)
		mscr |= MII_88E1121_PHY_MSCR_TX_DELAY;
	else if (tx_delay_ps != PHY_RGMII_DELAY_PS_NONE)
		/* ... log an error to kernel log */
		return -EINVAL;

	if (rx_delay_ps == PHY_RGMII_DELAY_PS_STD)
		mscr |= MII_88E1121_PHY_MSCR_RX_DELAY;
	else if (rx_delay_ps != PHY_RGMII_DELAY_PS_NONE)
		/* ... log an error to kernel log */
		return -EINVAL;

	return phy_modify_paged(phydev, MII_MARVELL_MSCR_PAGE,
				MII_88E1121_PHY_MSCR_REG,
				MII_88E1121_PHY_MSCR_DELAY_MASK, mscr);
}

> How can we achieve wider consensus on this and put it into the dt
> specification? Should there be drivers supporting these first?

Provide an illustration of the idea in code form for consideration? ;)

Florian Fainelli June 18, 2020, 6:06 p.m. UTC | #17

On 6/18/2020 1:45 AM, Russell King - ARM Linux admin wrote:
> On Thu, Jun 18, 2020 at 10:14:33AM +0200, Helmut Grohne wrote:
>> On Wed, Jun 17, 2020 at 02:08:09PM +0200, Russell King - ARM Linux admin wrote:
>>> With a fixed link, we could be in either a MAC-to-PHY or MAC-to-MAC
>>> setup; we just don't know.  However, we don't have is access to the
>>> PHY (if it exists) in the fixed link case to configure it for the
>>> delay.
>>
>> Let me twist that a little: We may have access to the PHY, but we don't
>> always have access. When we do have access, we have a separate device
>> tree node with another fixed-link and another phy-mode. For fixed-links,
>> we specify the phy-mode for each end.
> 
> If you have access to the PHY, you shouldn't be using fixed-link. In
> any case, there is no support for a fixed-link specification at the
> PHY end in the kernel.  The PHY binding doc does not allow for this
> either.
> 
>>> In the MAC-to-MAC RGMII setup, where neither MAC can insert the
>>> necessary delay, the only way to have a RGMII conformant link is to
>>> have the PCB traces induce the necessary delay. That errs towards
>>> PHY_INTERFACE_MODE_RGMII for this case.
>>
>> Yes.
>>
>>> However, considering the MAC-to-PHY RGMII fixed link case, where the
>>> PHY may not be accessible, and may be configured with the necessary
>>> delay, should that case also use PHY_INTERFACE_MODE_RGMII - clearly
>>> that would be as wrong as using PHY_INTERFACE_MODE_RGMII_ID would
>>> be for the MAC-to-MAC RGMII with PCB-delays case.
>>
>> If you take into account that the PHY has a separate node with phy-mode
>> being rgmii-id, it makes a lot more sense to use rgmii for the phy-mode
>> of the MAC. So I don't think it is that clear that doing so is wrong.
> 
> The PHY binding document does not allow for this, neither does the
> kernel.
> 
>> In an earlier discussion Florian Fainelli said:
>> https://patchwork.ozlabs.org/project/netdev/patch/20190410005700.31582-19-olteanv@gmail.com/#2149183
>> | fixed-link really should denote a MAC to MAC connection so if you have
>> | "rgmii-id" on one side, you would expect "rgmii" on the other side
>> | (unless PCB traces account for delays, grrr).
>>
>> For these reasons, I still think that rgmii would be a useful
>> description for the fixed-link to PHY connection where the PHY adds the
>> delay.
> 
> I think Florian is wrong; consider what it means when you have a
> fixed-link connection between a MAC and an inaccessible PHY and
> the link as a whole is operating in what we would call "rgmii-id"
> mode if the PHY were accessible.
> 
> Taking Florian's stance, it basically means that DT no longer
> describes the hardware, but how we have chosen to interpret the
> properties in _one_ specific case in a completely different way
> to all the other cases.

How do you ensure that a MAC to MAC connection using RGMII actually
works if you do not provide a 'phy-mode' for both sides right now?

Yes this is more of a DT now describes a configuration choice rather
than the hardware but given that Ethernet MACs are usually capable of
supporting all 4 possible RGMII modes, how do you propose to solve the
negotiation of the appropriate RGMII mode here?

> 
>>> So, I think a MAC driver should not care about the specific RGMII
>>> mode being asked for in any case, and just accept them all.
>>
>> I would like to agree to this. However, the implication is that when you
>> get your delays wrong, your driver silently ignores you and you never
>> notice your mistake until you see no traffic passing and wonder why.
> 
> So explain to me this aspect of your reasoning:
> 
> - If the link is operating in non-fixed-link mode, the rgmii* PHY modes
>   describe the delay to be applied at the PHY end.
> - If the link is operating in fixed-link mode, the rgmii* PHY modes
>   describe the delay to be applied at the MAC end.
> 
> That doesn't make sense, and excludes properly describing a MAC-to-
> inaccessible-PHY setup.

The point with a fixed link is that it does not matter what is the other
end, it could be a MAC, it could be a PHY, it could go nowhere, it just
does not matter, the only thing that you can know is you configure your
side of the fixed link. If you have visibility into the other side as
well, you can go one step further an put something in the other
fixed-link node that makes sense and will result in a working RGMII link.

> 
> It also means that we're having to conditionalise how we deal with
> this PHY mode in every single driver, which means that every single
> driver is going to end up interpreting it differently, and it's going
> to become a buggy mess.

Newsflash: it is already a buggy mess.

> 
>> In this case, I was faced with a PHY that would do rgmii-txid and I
>> configured that on the MAC. Unfortunately, macb_main.c didn't tell me
>> that it did rgmii-id instead.
> 
> The documentation makes it clear that "rgmii-*" (note the hyphen) are
> to be applied by the PHY *only*, and not by the MAC.
> 
>>> I also think that some of this ought to be put in the documentation
>>> as guidance for new implementations.
>>
>> That seems to be the part where everyone agrees.
>>
>> Given the state of the discussion, I'm wondering whether this could be
>> fixed at a more fundamental level in the device tree bindings.
>>
>> A number of people (including you) pointed out that retroactively fixing
>> the meaning of phy modes does not work and causes pain instead. That
>> hints that the only way to fix this is adding new properties. How about
>> this?
>>
>> rgmii-delay-type:
>>   description:
>>     Responsibility for adding the rgmii delay
>>   enum:
>>     # The remote PHY or MAC to this MAC is responsible for adding the
>>     # delay.
>>     - remote
>>     # The delay is added by neither MAC nor MAC, but using PCB traces
>>     # instead.
>>     - pcb
>>     # The MAC must add the delay.
>>     - local
> 
> Why do we need that complexity?  If we decide that we can allow
> phy-mode = "rgmii" and introduce new properties to control the
> delay, then we just need:
> 
>   rgmii-tx-delay-ps = <nnn>;
>   rgmii-rx-delay-ps = <nnn>;
> 
> specified in the MAC node (to be applied only at the MAC end) or
> specified in the PHY node (to be applied only at the PHY end.)
> In the normal case, this would be the standard delay value, but
> in exceptional cases where supported, the delays can be arbitary.
> We know there are PHYs out there which allow other delays.
> 
> This means each end is responsible for parsing these properties in
> its own node and applying them - or raising an error if they can't
> be supported.

And as we all know, RGMII is the most plug and play standard out there
so this is "just going to work".

> 
> With your "rgmii-delay-type" idea, if this is only specified at the
> MAC end, then the PHY code somehow needs to know what that setting is,
> which adds a lot of complexity - the PHY code has to go digging for
> the MAC node, we may even have to introduce a back-reference from the
> PHY node to the MAC node so the PHY can find it.  There are MAC drivers
> out there where there is one struct device, but multiple net devices,
> so digging inside struct net_device to get at the parent struct device,
> and then trying to parse "rgmii-delay-type" from the of-node won't work.
>-- 
Florian

Florian Fainelli June 18, 2020, 6:25 p.m. UTC | #18

On 6/17/2020 4:57 AM, Russell King - ARM Linux admin wrote:
> On Wed, Jun 17, 2020 at 02:38:25PM +0300, Vladimir Oltean wrote:
>> On Wed, 17 Jun 2020 at 14:34, Russell King - ARM Linux admin
>> <linux@armlinux.org.uk> wrote:
>>>
>>
>>>
>>> Why are you so abrasive?
>>>
>>> Not responding to this until you start behaving more reasonably.
>>>
>>> --
>>> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
>>> FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
>>
>> Sorry.
>> What I meant to say is: the documentation is unclear. It has been
>> interpreted in conflicting ways, where the strict interpretations have
>> proven to have practical limitations, and the practical
>> interpretations have been accused of being incorrect. In my opinion
>> there is no way to fix it without introducing new bindings, which I am
>> not sure is really worth doing.
> 
> The documentation was added in 2016, many years after we have had users
> of this, in an attempt to clear up some of the confusion.  It is likely
> that it had to cater for existing users though - I'm sure if Florian
> cares, he can comment on that.

The need to clarify the 'phy-mode' property came in while reviewing the
aurora network driver and trying to make sense of what should be done
for a new driver in a way that would result in hopefully less confusion
moving forward.

> 
> It would be better if it made a definitive statement about it, but doing
> so would likely attract pedants to try to fix everything to conform,
> causing breakage in the process.

Exactly.

> 
> I've recently had a painful experience of this with the Atheros PHYs,
> where there are lots of platforms using "rgmii" when they should have
> been using "rgmii-id".  A patch changed this in the Atheros code breaking
> all these platforms, breakage which persisted over several kernel
> versions, needing fixes to DT files that then had to be back-ported.
> That's fine if you know what happened to break it, but if you don't, and
> you don't know what the fix is, you're mostly stuffed and stuck with non-
> working ethernet.  That really was not nice.
> 
> This is one of the reasons why I press for any new PHY interface mode
> to be documented in the phylib documentation right from the start, so
> that hopefully we can avoid this kind of thing in the future.
> 

It seems to be that this is a very RGMII specific problem and no other
electrical interface appears to have that problem, or it solves it
differently without requiring massive baby sitting from software (or
even more, just not for that particular problem maybe).

Florian Fainelli June 18, 2020, 6:26 p.m. UTC | #19

On 6/18/2020 3:01 AM, Russell King - ARM Linux admin wrote:
> On Thu, Jun 18, 2020 at 11:05:26AM +0200, Helmut Grohne wrote:
>> On Thu, Jun 18, 2020 at 10:45:54AM +0200, Russell King - ARM Linux admin wrote:
>>> Why do we need that complexity?  If we decide that we can allow
>>> phy-mode = "rgmii" and introduce new properties to control the
>>> delay, then we just need:
>>>
>>>   rgmii-tx-delay-ps = <nnn>;
>>>   rgmii-rx-delay-ps = <nnn>;
>>>
>>> specified in the MAC node (to be applied only at the MAC end) or
>>> specified in the PHY node (to be applied only at the PHY end.)
>>> In the normal case, this would be the standard delay value, but
>>> in exceptional cases where supported, the delays can be arbitary.
>>> We know there are PHYs out there which allow other delays.
>>>
>>> This means each end is responsible for parsing these properties in
>>> its own node and applying them - or raising an error if they can't
>>> be supported.
>>
>> Thank you. That makes a lot more sense while keeping the (imo) important
>> properties of my proposal:
>>  * It is backwards compatible. These properties override delays
>>    specified inside phy-mode. Otherwise the vague phy-mode meaning is
>>    retained.
>>  * The ambiguity is resolved. It is always clear where delays should be
>>    configure and the properties properly account for possible PCB
>>    traces.
>>
>> It also resolves my original problem. If support for these properties is
>> added to macb_main.c, it would simply check that both delays are 0 as
>> internal delays are not supported by the hardware.  When I would have
>> attempted to configure an rx delay, it would have nicely errored out.
> 
> I think we'd want a helper or two to do the parsing and return the
> delays, something like:

Can you review Dan's patch series since he is attempting something that
may not end up solving Helmut's problem completely:

http://patchwork.ozlabs.org/project/netdev/list/?series=184052

Russell King (Oracle) June 18, 2020, 6:49 p.m. UTC | #20

On Thu, Jun 18, 2020 at 11:06:43AM -0700, Florian Fainelli wrote:
> 
> 
> On 6/18/2020 1:45 AM, Russell King - ARM Linux admin wrote:
> > On Thu, Jun 18, 2020 at 10:14:33AM +0200, Helmut Grohne wrote:
> >> On Wed, Jun 17, 2020 at 02:08:09PM +0200, Russell King - ARM Linux admin wrote:
> >>> With a fixed link, we could be in either a MAC-to-PHY or MAC-to-MAC
> >>> setup; we just don't know.  However, we don't have is access to the
> >>> PHY (if it exists) in the fixed link case to configure it for the
> >>> delay.
> >>
> >> Let me twist that a little: We may have access to the PHY, but we don't
> >> always have access. When we do have access, we have a separate device
> >> tree node with another fixed-link and another phy-mode. For fixed-links,
> >> we specify the phy-mode for each end.
> > 
> > If you have access to the PHY, you shouldn't be using fixed-link. In
> > any case, there is no support for a fixed-link specification at the
> > PHY end in the kernel.  The PHY binding doc does not allow for this
> > either.
> > 
> >>> In the MAC-to-MAC RGMII setup, where neither MAC can insert the
> >>> necessary delay, the only way to have a RGMII conformant link is to
> >>> have the PCB traces induce the necessary delay. That errs towards
> >>> PHY_INTERFACE_MODE_RGMII for this case.
> >>
> >> Yes.
> >>
> >>> However, considering the MAC-to-PHY RGMII fixed link case, where the
> >>> PHY may not be accessible, and may be configured with the necessary
> >>> delay, should that case also use PHY_INTERFACE_MODE_RGMII - clearly
> >>> that would be as wrong as using PHY_INTERFACE_MODE_RGMII_ID would
> >>> be for the MAC-to-MAC RGMII with PCB-delays case.
> >>
> >> If you take into account that the PHY has a separate node with phy-mode
> >> being rgmii-id, it makes a lot more sense to use rgmii for the phy-mode
> >> of the MAC. So I don't think it is that clear that doing so is wrong.
> > 
> > The PHY binding document does not allow for this, neither does the
> > kernel.
> > 
> >> In an earlier discussion Florian Fainelli said:
> >> https://patchwork.ozlabs.org/project/netdev/patch/20190410005700.31582-19-olteanv@gmail.com/#2149183
> >> | fixed-link really should denote a MAC to MAC connection so if you have
> >> | "rgmii-id" on one side, you would expect "rgmii" on the other side
> >> | (unless PCB traces account for delays, grrr).
> >>
> >> For these reasons, I still think that rgmii would be a useful
> >> description for the fixed-link to PHY connection where the PHY adds the
> >> delay.
> > 
> > I think Florian is wrong; consider what it means when you have a
> > fixed-link connection between a MAC and an inaccessible PHY and
> > the link as a whole is operating in what we would call "rgmii-id"
> > mode if the PHY were accessible.
> > 
> > Taking Florian's stance, it basically means that DT no longer
> > describes the hardware, but how we have chosen to interpret the
> > properties in _one_ specific case in a completely different way
> > to all the other cases.
> 
> How do you ensure that a MAC to MAC connection using RGMII actually
> works if you do not provide a 'phy-mode' for both sides right now?
> 
> Yes this is more of a DT now describes a configuration choice rather
> than the hardware but given that Ethernet MACs are usually capable of
> supporting all 4 possible RGMII modes, how do you propose to solve the
> negotiation of the appropriate RGMII mode here?

This is actually answered by yourself below.

> >>> So, I think a MAC driver should not care about the specific RGMII
> >>> mode being asked for in any case, and just accept them all.
> >>
> >> I would like to agree to this. However, the implication is that when you
> >> get your delays wrong, your driver silently ignores you and you never
> >> notice your mistake until you see no traffic passing and wonder why.
> > 
> > So explain to me this aspect of your reasoning:
> > 
> > - If the link is operating in non-fixed-link mode, the rgmii* PHY modes
> >   describe the delay to be applied at the PHY end.
> > - If the link is operating in fixed-link mode, the rgmii* PHY modes
> >   describe the delay to be applied at the MAC end.
> > 
> > That doesn't make sense, and excludes properly describing a MAC-to-
> > inaccessible-PHY setup.
> 
> The point with a fixed link is that it does not matter what is the other
> end, it could be a MAC, it could be a PHY, it could go nowhere, it just
> does not matter, the only thing that you can know is you configure your
> side of the fixed link.

That is *exactly* my point.

Today, with a PHY on the other end, the four RGMII modes tell you how
the PHY is to be configured, not the MAC.  The documentation states
this.

So, why should a fixed-link be any different?

Florian Fainelli June 18, 2020, 7:02 p.m. UTC | #21

On 6/18/2020 11:49 AM, Russell King - ARM Linux admin wrote:
> On Thu, Jun 18, 2020 at 11:06:43AM -0700, Florian Fainelli wrote:
>>
>>
>> On 6/18/2020 1:45 AM, Russell King - ARM Linux admin wrote:
>>> On Thu, Jun 18, 2020 at 10:14:33AM +0200, Helmut Grohne wrote:
>>>> On Wed, Jun 17, 2020 at 02:08:09PM +0200, Russell King - ARM Linux admin wrote:
>>>>> With a fixed link, we could be in either a MAC-to-PHY or MAC-to-MAC
>>>>> setup; we just don't know.  However, we don't have is access to the
>>>>> PHY (if it exists) in the fixed link case to configure it for the
>>>>> delay.
>>>>
>>>> Let me twist that a little: We may have access to the PHY, but we don't
>>>> always have access. When we do have access, we have a separate device
>>>> tree node with another fixed-link and another phy-mode. For fixed-links,
>>>> we specify the phy-mode for each end.
>>>
>>> If you have access to the PHY, you shouldn't be using fixed-link. In
>>> any case, there is no support for a fixed-link specification at the
>>> PHY end in the kernel.  The PHY binding doc does not allow for this
>>> either.
>>>
>>>>> In the MAC-to-MAC RGMII setup, where neither MAC can insert the
>>>>> necessary delay, the only way to have a RGMII conformant link is to
>>>>> have the PCB traces induce the necessary delay. That errs towards
>>>>> PHY_INTERFACE_MODE_RGMII for this case.
>>>>
>>>> Yes.
>>>>
>>>>> However, considering the MAC-to-PHY RGMII fixed link case, where the
>>>>> PHY may not be accessible, and may be configured with the necessary
>>>>> delay, should that case also use PHY_INTERFACE_MODE_RGMII - clearly
>>>>> that would be as wrong as using PHY_INTERFACE_MODE_RGMII_ID would
>>>>> be for the MAC-to-MAC RGMII with PCB-delays case.
>>>>
>>>> If you take into account that the PHY has a separate node with phy-mode
>>>> being rgmii-id, it makes a lot more sense to use rgmii for the phy-mode
>>>> of the MAC. So I don't think it is that clear that doing so is wrong.
>>>
>>> The PHY binding document does not allow for this, neither does the
>>> kernel.
>>>
>>>> In an earlier discussion Florian Fainelli said:
>>>> https://patchwork.ozlabs.org/project/netdev/patch/20190410005700.31582-19-olteanv@gmail.com/#2149183
>>>> | fixed-link really should denote a MAC to MAC connection so if you have
>>>> | "rgmii-id" on one side, you would expect "rgmii" on the other side
>>>> | (unless PCB traces account for delays, grrr).
>>>>
>>>> For these reasons, I still think that rgmii would be a useful
>>>> description for the fixed-link to PHY connection where the PHY adds the
>>>> delay.
>>>
>>> I think Florian is wrong; consider what it means when you have a
>>> fixed-link connection between a MAC and an inaccessible PHY and
>>> the link as a whole is operating in what we would call "rgmii-id"
>>> mode if the PHY were accessible.
>>>
>>> Taking Florian's stance, it basically means that DT no longer
>>> describes the hardware, but how we have chosen to interpret the
>>> properties in _one_ specific case in a completely different way
>>> to all the other cases.
>>
>> How do you ensure that a MAC to MAC connection using RGMII actually
>> works if you do not provide a 'phy-mode' for both sides right now?
>>
>> Yes this is more of a DT now describes a configuration choice rather
>> than the hardware but given that Ethernet MACs are usually capable of
>> supporting all 4 possible RGMII modes, how do you propose to solve the
>> negotiation of the appropriate RGMII mode here?
> 
> This is actually answered by yourself below.
> 
>>>>> So, I think a MAC driver should not care about the specific RGMII
>>>>> mode being asked for in any case, and just accept them all.
>>>>
>>>> I would like to agree to this. However, the implication is that when you
>>>> get your delays wrong, your driver silently ignores you and you never
>>>> notice your mistake until you see no traffic passing and wonder why.
>>>
>>> So explain to me this aspect of your reasoning:
>>>
>>> - If the link is operating in non-fixed-link mode, the rgmii* PHY modes
>>>   describe the delay to be applied at the PHY end.
>>> - If the link is operating in fixed-link mode, the rgmii* PHY modes
>>>   describe the delay to be applied at the MAC end.
>>>
>>> That doesn't make sense, and excludes properly describing a MAC-to-
>>> inaccessible-PHY setup.
>>
>> The point with a fixed link is that it does not matter what is the other
>> end, it could be a MAC, it could be a PHY, it could go nowhere, it just
>> does not matter, the only thing that you can know is you configure your
>> side of the fixed link.
> 
> That is *exactly* my point.
> 
> Today, with a PHY on the other end, the four RGMII modes tell you how
> the PHY is to be configured, not the MAC.  The documentation states
> this.
> 
> So, why should a fixed-link be any different?

It should not, but that means that when you describe the fixed-link, the
'phy-mode' within the local fixed-link node is meant to describe how the
other side is configured not how you are configured. For some reason I
did not consider that to be intuitive when I sent out my response, but I
suppose spelling it out would greatly help.

Russell King (Oracle) June 18, 2020, 7:10 p.m. UTC | #22

On Thu, Jun 18, 2020 at 12:02:13PM -0700, Florian Fainelli wrote:
> It should not, but that means that when you describe the fixed-link, the
> 'phy-mode' within the local fixed-link node is meant to describe how the
> other side is configured not how you are configured. For some reason I
> did not consider that to be intuitive when I sent out my response, but I
> suppose spelling it out would greatly help.

Right, so in the MAC-to-MAC setup that is being discussed in this
thread, the _local_ side MAC where phy-mode is specified should not
care which of the RGMII modes is specified, because the delays are
a function of "the other side" of the link.

So, the proposal that macb should limit itself to only accepting
"rgmii" in it's _local_ phy-mode property because the _local_ MAC
does not support delays is wrong.

net: macb: reject unsupported rgmii delays

Commit Message

Comments

Patch