diff mbox series

[net-next,v2,1/1] net: dsa: fix fixed-link port registration

Message ID 20190811031857.2899-1-marek.behun@nic.cz
State Changes Requested
Delegated to: David Miller
Headers show
Series [net-next,v2,1/1] net: dsa: fix fixed-link port registration | expand

Commit Message

Marek Behún Aug. 11, 2019, 3:18 a.m. UTC
Commit 88d6272acaaa ("net: phy: avoid unneeded MDIO reads in
genphy_read_status") broke fixed link DSA port registration in
dsa_port_fixed_link_register_of: the genphy_read_status does not do what
it is supposed to and the following adjust_link is given wrong
parameters.

This causes a regression on Turris Omnia, where the mvneta driver for
the interface connected to the switch reports crc errors, for some
reason.

I realize this fix is not ideal, something else could change in genphy
functions which could cause DSA fixed-link port to break again.
Hopefully DSA fixed-link port functionality will be converted to phylink
API soon.

Signed-off-by: Marek Behún <marek.behun@nic.cz>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
Cc: Vivien Didelot <vivien.didelot@gmail.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
---
 net/dsa/port.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Andrew Lunn Aug. 11, 2019, 3:39 a.m. UTC | #1
On Sun, Aug 11, 2019 at 05:18:57AM +0200, Marek Behún wrote:
> Commit 88d6272acaaa ("net: phy: avoid unneeded MDIO reads in
> genphy_read_status") broke fixed link DSA port registration in
> dsa_port_fixed_link_register_of: the genphy_read_status does not do what
> it is supposed to and the following adjust_link is given wrong
> parameters.

Hi Marek

Which parameters are incorrect?

In fixed_phy.c, __fixed_phy_register() there is:

        /* propagate the fixed link values to struct phy_device */
        phy->link = status->link;
        if (status->link) {
                phy->speed = status->speed;
                phy->duplex = status->duplex;
                phy->pause = status->pause;
                phy->asym_pause = status->asym_pause;
        }

Are we not initialising something? Or is the initialisation done here
getting reset sometime afterwards?

Thanks
	Andrew
Sergei Shtylyov Aug. 11, 2019, 9:54 a.m. UTC | #2
Hello!

    Just noticed a comment typo...

On 11.08.2019 6:18, Marek Behún wrote:

> Commit 88d6272acaaa ("net: phy: avoid unneeded MDIO reads in
> genphy_read_status") broke fixed link DSA port registration in
> dsa_port_fixed_link_register_of: the genphy_read_status does not do what
> it is supposed to and the following adjust_link is given wrong
> parameters.
> 
> This causes a regression on Turris Omnia, where the mvneta driver for
> the interface connected to the switch reports crc errors, for some
> reason.
> 
> I realize this fix is not ideal, something else could change in genphy
> functions which could cause DSA fixed-link port to break again.
> Hopefully DSA fixed-link port functionality will be converted to phylink
> API soon.
> 
> Signed-off-by: Marek Behún <marek.behun@nic.cz>
> Cc: Heiner Kallweit <hkallweit1@gmail.com>
> Cc: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
> Cc: Vivien Didelot <vivien.didelot@gmail.com>
> Cc: Andrew Lunn <andrew@lunn.ch>
> Cc: Florian Fainelli <f.fainelli@gmail.com>
> Cc: David S. Miller <davem@davemloft.net>
> ---
>   net/dsa/port.c | 11 +++++++++++
>   1 file changed, 11 insertions(+)
> 
> diff --git a/net/dsa/port.c b/net/dsa/port.c
> index 363eab6df51b..c424ebb373e1 100644
> --- a/net/dsa/port.c
> +++ b/net/dsa/port.c
> @@ -485,6 +485,17 @@ static int dsa_port_fixed_link_register_of(struct dsa_port *dp)
>   	phydev->interface = mode;
>   
>   	genphy_config_init(phydev);
> +
> +	/*
> +	 * Commit 88d6272acaaa caused genphy_read_status not to do it's work if
                                                                    ^^^^ its.

> +	 * autonegotiation is enabled and link status did not change. This is
> +	 * the case for fixed_phy. By setting phydev->link = 0 before the call
> +	 * to genphy_read_status we force it to read and fill in the parameters.
> +	 *
> +	 * Hopefully this dirty hack will be removed soon by converting DSA
> +	 * fixed link ports to phylink API.
> +	 */
> +	phydev->link = 0;
>   	genphy_read_status(phydev);
>   
>   	if (ds->ops->adjust_link)

MBR, Sergei
Heiner Kallweit Aug. 11, 2019, 11:35 a.m. UTC | #3
On 11.08.2019 05:39, Andrew Lunn wrote:
> On Sun, Aug 11, 2019 at 05:18:57AM +0200, Marek Behún wrote:
>> Commit 88d6272acaaa ("net: phy: avoid unneeded MDIO reads in
>> genphy_read_status") broke fixed link DSA port registration in
>> dsa_port_fixed_link_register_of: the genphy_read_status does not do what
>> it is supposed to and the following adjust_link is given wrong
>> parameters.
> 
> Hi Marek
> 
> Which parameters are incorrect?
> 
> In fixed_phy.c, __fixed_phy_register() there is:
> 
>         /* propagate the fixed link values to struct phy_device */
>         phy->link = status->link;
>         if (status->link) {
>                 phy->speed = status->speed;
>                 phy->duplex = status->duplex;
>                 phy->pause = status->pause;
>                 phy->asym_pause = status->asym_pause;
>         }
> 
> Are we not initialising something? Or is the initialisation done here
> getting reset sometime afterwards?
> 
In addition to Andrew's question:
We talk about this DT config: armada-385-turris-omnia.dts ?
Which kernel version are you using?

> Thanks
> 	Andrew
> 
Heiner
Marek Behún Aug. 11, 2019, 2:04 p.m. UTC | #4
OK guys, something is terribly wrong here.

I bisected to the commit mentioned (88d6272acaaa), looked around at the
genphy functions, tried adding the link=0 workaround and it did work,
so I though this was the issue.

What I realized now is that before the commit 88d6272acaaa things
worked because of two bugs, which negated each other. This commit caused
one of this bugs not to fire, and thus the second bug was not negated.

What actually happened before the commit that broke it is this:
  - after the fixed_phy is created, the parameters are corrent
  - genphy_read_status breaks the parameters:
     - first it sets the parameters to unknown (SPEED_UNKNOWN,
       DUPLEX_UNKNOWN)
     - then read the registers, which are simulated for fixed_phy
     - then it uses phy-core.c:phy_resolve_aneg_linkmode function, which
       looks for correct settings by bit-anding the ->advertising and
       ->lp_advertigins bit arrays. But in fixed_phy, ->lp_advertising
       is set to zero, so the parameters are left at SPEED_UNKNOWN, ...
       (this is the first bug)
  - then adjust_link is called, which then goes to
    mv88e6xxx_port_setup_mac, where there is a test if it should change
    something:
       if (state.link == link && state.speed == speed &&
           state.duplex == duplex)
               return 0;
  - since current speed on the switch port (state.speed) is SPEED_1000,
    and new speed is SPEED_UNKNOWN, this test fails, and so the rest of
    this function is called, which makes the port work
    (the if test is the second bug)

After the commit that broke things:
  - after the fixed_phy is created, the parameters are corrent
  - genphy_read_status doesn't change them
  - mv88e6xxx_port_setup_mac does nothing, since the if condition above
    is true

So, there are two things that are broken:
 - the test in mv88e6xxx_port_setup_mac whether there is to be a change
   should be more sophisticated
 - fixed_phy should also simulate the lp_advertising register

What do you think of this?

Marek

On Sun, 11 Aug 2019 13:35:20 +0200
Heiner Kallweit <hkallweit1@gmail.com> wrote:

> On 11.08.2019 05:39, Andrew Lunn wrote:
> > On Sun, Aug 11, 2019 at 05:18:57AM +0200, Marek Behún wrote:  
> >> Commit 88d6272acaaa ("net: phy: avoid unneeded MDIO reads in
> >> genphy_read_status") broke fixed link DSA port registration in
> >> dsa_port_fixed_link_register_of: the genphy_read_status does not do what
> >> it is supposed to and the following adjust_link is given wrong
> >> parameters.  
> > 
> > Hi Marek
> > 
> > Which parameters are incorrect?
> > 
> > In fixed_phy.c, __fixed_phy_register() there is:
> > 
> >         /* propagate the fixed link values to struct phy_device */
> >         phy->link = status->link;
> >         if (status->link) {
> >                 phy->speed = status->speed;
> >                 phy->duplex = status->duplex;
> >                 phy->pause = status->pause;
> >                 phy->asym_pause = status->asym_pause;
> >         }
> > 
> > Are we not initialising something? Or is the initialisation done here
> > getting reset sometime afterwards?
> >   
> In addition to Andrew's question:
> We talk about this DT config: armada-385-turris-omnia.dts ?
> Which kernel version are you using?
> 
> > Thanks
> > 	Andrew
> >   
> Heiner
Marek Behún Aug. 11, 2019, 3:08 p.m. UTC | #5
I have sent two new patches, each fixing one of the bugs that negated
each other.

Marek
Vladimir Oltean Aug. 11, 2019, 3:16 p.m. UTC | #6
Hi Marek,

On Sun, 11 Aug 2019 at 17:06, Marek Behun <marek.behun@nic.cz> wrote:
>
> OK guys, something is terribly wrong here.
>
> I bisected to the commit mentioned (88d6272acaaa), looked around at the
> genphy functions, tried adding the link=0 workaround and it did work,
> so I though this was the issue.
>
> What I realized now is that before the commit 88d6272acaaa things
> worked because of two bugs, which negated each other. This commit caused
> one of this bugs not to fire, and thus the second bug was not negated.
>
> What actually happened before the commit that broke it is this:
>   - after the fixed_phy is created, the parameters are corrent
>   - genphy_read_status breaks the parameters:
>      - first it sets the parameters to unknown (SPEED_UNKNOWN,
>        DUPLEX_UNKNOWN)
>      - then read the registers, which are simulated for fixed_phy
>      - then it uses phy-core.c:phy_resolve_aneg_linkmode function, which
>        looks for correct settings by bit-anding the ->advertising and
>        ->lp_advertigins bit arrays. But in fixed_phy, ->lp_advertising
>        is set to zero, so the parameters are left at SPEED_UNKNOWN, ...
>        (this is the first bug)
>   - then adjust_link is called, which then goes to
>     mv88e6xxx_port_setup_mac, where there is a test if it should change
>     something:
>        if (state.link == link && state.speed == speed &&
>            state.duplex == duplex)
>                return 0;
>   - since current speed on the switch port (state.speed) is SPEED_1000,
>     and new speed is SPEED_UNKNOWN, this test fails, and so the rest of
>     this function is called, which makes the port work
>     (the if test is the second bug)
>
> After the commit that broke things:
>   - after the fixed_phy is created, the parameters are corrent
>   - genphy_read_status doesn't change them
>   - mv88e6xxx_port_setup_mac does nothing, since the if condition above
>     is true
>
> So, there are two things that are broken:
>  - the test in mv88e6xxx_port_setup_mac whether there is to be a change
>    should be more sophisticated
>  - fixed_phy should also simulate the lp_advertising register
>
> What do you think of this?
>

I don't know. But I think Heiner was asking you what kernel you're on
because of what you said here:

> Hopefully DSA fixed-link port functionality will be converted to phylink
> API soon.

The DSA fixed-link port functionality *has* been converted to phylink.
See:
- https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=0e27921816ad9
- https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=7fb5a711545d7d25fe9726a9ad277474dd83bd06


> Marek
>
> On Sun, 11 Aug 2019 13:35:20 +0200
> Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
> > On 11.08.2019 05:39, Andrew Lunn wrote:
> > > On Sun, Aug 11, 2019 at 05:18:57AM +0200, Marek Behún wrote:
> > >> Commit 88d6272acaaa ("net: phy: avoid unneeded MDIO reads in
> > >> genphy_read_status") broke fixed link DSA port registration in
> > >> dsa_port_fixed_link_register_of: the genphy_read_status does not do what
> > >> it is supposed to and the following adjust_link is given wrong
> > >> parameters.
> > >
> > > Hi Marek
> > >
> > > Which parameters are incorrect?
> > >
> > > In fixed_phy.c, __fixed_phy_register() there is:
> > >
> > >         /* propagate the fixed link values to struct phy_device */
> > >         phy->link = status->link;
> > >         if (status->link) {
> > >                 phy->speed = status->speed;
> > >                 phy->duplex = status->duplex;
> > >                 phy->pause = status->pause;
> > >                 phy->asym_pause = status->asym_pause;
> > >         }
> > >
> > > Are we not initialising something? Or is the initialisation done here
> > > getting reset sometime afterwards?
> > >
> > In addition to Andrew's question:
> > We talk about this DT config: armada-385-turris-omnia.dts ?
> > Which kernel version are you using?
> >
> > > Thanks
> > >     Andrew
> > >
> > Heiner
>

Regards,
-Vladimir
Andrew Lunn Aug. 11, 2019, 3:18 p.m. UTC | #7
On Sun, Aug 11, 2019 at 04:04:04PM +0200, Marek Behun wrote:
> OK guys, something is terribly wrong here.
> 
> I bisected to the commit mentioned (88d6272acaaa), looked around at the
> genphy functions, tried adding the link=0 workaround and it did work,
> so I though this was the issue.
> 
> What I realized now is that before the commit 88d6272acaaa things
> worked because of two bugs, which negated each other. This commit caused
> one of this bugs not to fire, and thus the second bug was not negated.
> 
> What actually happened before the commit that broke it is this:
>   - after the fixed_phy is created, the parameters are corrent
>   - genphy_read_status breaks the parameters:
>      - first it sets the parameters to unknown (SPEED_UNKNOWN,
>        DUPLEX_UNKNOWN)
>      - then read the registers, which are simulated for fixed_phy
>      - then it uses phy-core.c:phy_resolve_aneg_linkmode function, which
>        looks for correct settings by bit-anding the ->advertising and
>        ->lp_advertigins bit arrays. But in fixed_phy, ->lp_advertising
>        is set to zero, so the parameters are left at SPEED_UNKNOWN, ...
>        (this is the first bug)
>   - then adjust_link is called, which then goes to
>     mv88e6xxx_port_setup_mac, where there is a test if it should change
>     something:
>        if (state.link == link && state.speed == speed &&
>            state.duplex == duplex)
>                return 0;
>   - since current speed on the switch port (state.speed) is SPEED_1000,
>     and new speed is SPEED_UNKNOWN, this test fails, and so the rest of
>     this function is called, which makes the port work
>     (the if test is the second bug)
> 
> After the commit that broke things:
>   - after the fixed_phy is created, the parameters are corrent
>   - genphy_read_status doesn't change them
>   - mv88e6xxx_port_setup_mac does nothing, since the if condition above
>     is true
> 
> So, there are two things that are broken:
>  - the test in mv88e6xxx_port_setup_mac whether there is to be a change
>    should be more sophisticated
>  - fixed_phy should also simulate the lp_advertising register
> 
> What do you think of this?

Marek

This is the sort of information i like. I was having trouble
understanding what was really wrong and how it was fixed by your
previous patch.

So setting the emulated lp_advertise to advertise makes a lot of sense
for fixed phy. And it is something worthy of stable.

As for mv88e6xxx_port_setup_mac(), which parameter is actually
important here? My assumption was, if one of the other parameters
changes, it would not happen alone. The link would also go down, and
later come up again, etc. But it seems that assumption is wrong.

At a guess, it is the RGMII delays. That would explain CRC errors in
frames received by the master interface.

       Andrew
Marek Behún Aug. 11, 2019, 4:21 p.m. UTC | #8
On Sun, 11 Aug 2019 18:16:11 +0300
Vladimir Oltean <olteanv@gmail.com> wrote:

> The DSA fixed-link port functionality *has* been converted to phylink.
> See:
> - https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=0e27921816ad9
> - https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=7fb5a711545d7d25fe9726a9ad277474dd83bd06
>

/o\ my bad. I did not realize that I was working on 5.2 :(. Sorry.
diff mbox series

Patch

diff --git a/net/dsa/port.c b/net/dsa/port.c
index 363eab6df51b..c424ebb373e1 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -485,6 +485,17 @@  static int dsa_port_fixed_link_register_of(struct dsa_port *dp)
 	phydev->interface = mode;
 
 	genphy_config_init(phydev);
+
+	/*
+	 * Commit 88d6272acaaa caused genphy_read_status not to do it's work if
+	 * autonegotiation is enabled and link status did not change. This is
+	 * the case for fixed_phy. By setting phydev->link = 0 before the call
+	 * to genphy_read_status we force it to read and fill in the parameters.
+	 *
+	 * Hopefully this dirty hack will be removed soon by converting DSA
+	 * fixed link ports to phylink API.
+	 */
+	phydev->link = 0;
 	genphy_read_status(phydev);
 
 	if (ds->ops->adjust_link)