diff mbox series

net: phy: fix auto-negotiation in case of 'down-shift'

Message ID 20201124143848.874894-1-antonio.borneo@st.com
State Superseded
Headers show
Series net: phy: fix auto-negotiation in case of 'down-shift' | expand

Commit Message

Antonio Borneo Nov. 24, 2020, 2:38 p.m. UTC
If the auto-negotiation fails to establish a gigabit link, the phy
can try to 'down-shift': it resets the bits in MII_CTRL1000 to
stop advertising 1Gbps and retries the negotiation at 100Mbps.

From commit 5502b218e001 ("net: phy: use phy_resolve_aneg_linkmode
in genphy_read_status") the content of MII_CTRL1000 is not checked
anymore at the end of the negotiation, preventing the detection of
phy 'down-shift'.
In case of 'down-shift' phydev->advertising gets out-of-sync wrt
MII_CTRL1000 and still includes modes that the phy have already
dropped. The link partner could still advertise higher speeds,
while the link is established at one of the common lower speeds.
The logic 'and' in phy_resolve_aneg_linkmode() between
phydev->advertising and phydev->lp_advertising will report an
incorrect mode.

Issue detected with a local phy rtl8211f connected with a gigabit
capable router through a two-pairs network cable.

After auto-negotiation, read back MII_CTRL1000 and mask-out from
phydev->advertising the modes that have been eventually discarded
due to the 'down-shift'.

Fixes: 5502b218e001 ("net: phy: use phy_resolve_aneg_linkmode in genphy_read_status")
Cc: stable@vger.kernel.org # v5.1+
Signed-off-by: Antonio Borneo <antonio.borneo@st.com>
Link: https://lore.kernel.org/r/478f871a-583d-01f1-9cc5-2eea56d8c2a7@huawei.com
---
To: Andrew Lunn <andrew@lunn.ch>
To: Heiner Kallweit <hkallweit1@gmail.com>
To: Russell King <linux@armlinux.org.uk>
To: "David S. Miller" <davem@davemloft.net>
To: Jakub Kicinski <kuba@kernel.org>
To: netdev@vger.kernel.org
To: Yonglong Liu <liuyonglong@huawei.com>
Cc: linuxarm@huawei.com
Cc: Salil Mehta <salil.mehta@huawei.com>
Cc: linux-stm32@st-md-mailman.stormreply.com
Cc: linux-kernel@vger.kernel.org
Cc: Antonio Borneo <antonio.borneo@st.com>

 drivers/net/phy/phy_device.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)


base-commit: d549699048b4b5c22dd710455bcdb76966e55aa3

Comments

Russell King (Oracle) Nov. 24, 2020, 2:56 p.m. UTC | #1
On Tue, Nov 24, 2020 at 03:38:48PM +0100, Antonio Borneo wrote:
> If the auto-negotiation fails to establish a gigabit link, the phy
> can try to 'down-shift': it resets the bits in MII_CTRL1000 to
> stop advertising 1Gbps and retries the negotiation at 100Mbps.
> 
> From commit 5502b218e001 ("net: phy: use phy_resolve_aneg_linkmode
> in genphy_read_status") the content of MII_CTRL1000 is not checked
> anymore at the end of the negotiation, preventing the detection of
> phy 'down-shift'.
> In case of 'down-shift' phydev->advertising gets out-of-sync wrt
> MII_CTRL1000 and still includes modes that the phy have already
> dropped. The link partner could still advertise higher speeds,
> while the link is established at one of the common lower speeds.
> The logic 'and' in phy_resolve_aneg_linkmode() between
> phydev->advertising and phydev->lp_advertising will report an
> incorrect mode.
> 
> Issue detected with a local phy rtl8211f connected with a gigabit
> capable router through a two-pairs network cable.
> 
> After auto-negotiation, read back MII_CTRL1000 and mask-out from
> phydev->advertising the modes that have been eventually discarded
> due to the 'down-shift'.

Sorry, but no. While your solution will appear to work, in
introduces unexpected changes to the user visible APIs.

>  	if (phydev->autoneg == AUTONEG_ENABLE && phydev->autoneg_complete) {
> +		if (phydev->is_gigabit_capable) {
> +			adv = phy_read(phydev, MII_CTRL1000);
> +			if (adv < 0)
> +				return adv;
> +			/* update advertising in case of 'down-shift' */
> +			mii_ctrl1000_mod_linkmode_adv_t(phydev->advertising,
> +							adv);

If a down-shift occurs, this will cause the configured advertising
mask to lose the 1G speed, which will be visible to userspace.
Userspace doesn't expect the advertising mask to change beneath it.
Since updates from userspace are done using a read-modify-write of
the ksettings, this can have the undesired effect of removing 1G
from the configured advertising mask.

We've had other PHYs have this behaviour; the correct solution is for
the PHY driver to implement reading the resolution from the PHY rather
than relying on the generic implementation if it can down-shift.
Heiner Kallweit Nov. 24, 2020, 3:03 p.m. UTC | #2
Am 24.11.2020 um 15:38 schrieb Antonio Borneo:
> If the auto-negotiation fails to establish a gigabit link, the phy
> can try to 'down-shift': it resets the bits in MII_CTRL1000 to
> stop advertising 1Gbps and retries the negotiation at 100Mbps.
> 
I see that Russell answered already. My 2cts:

Are you sure all PHY's supporting downshift adjust the
advertisement bits? IIRC an Aquantia PHY I dealt with does not.
And if a PHY does so I'd consider this problematic:
Let's say you have a broken cable and the PHY downshifts to
100Mbps. If you change the cable then the PHY would still negotiate
100Mbps only.

Also I think phydev->advertising reflects what the user wants to
advertise, as mentioned by Russell before.


>>From commit 5502b218e001 ("net: phy: use phy_resolve_aneg_linkmode
> in genphy_read_status") the content of MII_CTRL1000 is not checked
> anymore at the end of the negotiation, preventing the detection of
> phy 'down-shift'.
> In case of 'down-shift' phydev->advertising gets out-of-sync wrt
> MII_CTRL1000 and still includes modes that the phy have already
> dropped. The link partner could still advertise higher speeds,
> while the link is established at one of the common lower speeds.
> The logic 'and' in phy_resolve_aneg_linkmode() between
> phydev->advertising and phydev->lp_advertising will report an
> incorrect mode.
> 
> Issue detected with a local phy rtl8211f connected with a gigabit
> capable router through a two-pairs network cable.
> 
> After auto-negotiation, read back MII_CTRL1000 and mask-out from
> phydev->advertising the modes that have been eventually discarded
> due to the 'down-shift'.
> 
> Fixes: 5502b218e001 ("net: phy: use phy_resolve_aneg_linkmode in genphy_read_status")
> Cc: stable@vger.kernel.org # v5.1+
> Signed-off-by: Antonio Borneo <antonio.borneo@st.com>
> Link: https://lore.kernel.org/r/478f871a-583d-01f1-9cc5-2eea56d8c2a7@huawei.com
> ---
> To: Andrew Lunn <andrew@lunn.ch>
> To: Heiner Kallweit <hkallweit1@gmail.com>
> To: Russell King <linux@armlinux.org.uk>
> To: "David S. Miller" <davem@davemloft.net>
> To: Jakub Kicinski <kuba@kernel.org>
> To: netdev@vger.kernel.org
> To: Yonglong Liu <liuyonglong@huawei.com>
> Cc: linuxarm@huawei.com
> Cc: Salil Mehta <salil.mehta@huawei.com>
> Cc: linux-stm32@st-md-mailman.stormreply.com
> Cc: linux-kernel@vger.kernel.org
> Cc: Antonio Borneo <antonio.borneo@st.com>
> 
>  drivers/net/phy/phy_device.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index 5dab6be6fc38..5d1060aa1b25 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -2331,7 +2331,7 @@ EXPORT_SYMBOL(genphy_read_status_fixed);
>   */
>  int genphy_read_status(struct phy_device *phydev)
>  {
> -	int err, old_link = phydev->link;
> +	int adv, err, old_link = phydev->link;
>  
>  	/* Update the link, but return if there was an error */
>  	err = genphy_update_link(phydev);
> @@ -2356,6 +2356,14 @@ int genphy_read_status(struct phy_device *phydev)
>  		return err;
>  
>  	if (phydev->autoneg == AUTONEG_ENABLE && phydev->autoneg_complete) {
> +		if (phydev->is_gigabit_capable) {
> +			adv = phy_read(phydev, MII_CTRL1000);
> +			if (adv < 0)
> +				return adv;
> +			/* update advertising in case of 'down-shift' */
> +			mii_ctrl1000_mod_linkmode_adv_t(phydev->advertising,
> +							adv);
> +		}
>  		phy_resolve_aneg_linkmode(phydev);
>  	} else if (phydev->autoneg == AUTONEG_DISABLE) {
>  		err = genphy_read_status_fixed(phydev);
> 
> base-commit: d549699048b4b5c22dd710455bcdb76966e55aa3
>
Russell King (Oracle) Nov. 24, 2020, 3:17 p.m. UTC | #3
On Tue, Nov 24, 2020 at 04:03:40PM +0100, Heiner Kallweit wrote:
> Am 24.11.2020 um 15:38 schrieb Antonio Borneo:
> > If the auto-negotiation fails to establish a gigabit link, the phy
> > can try to 'down-shift': it resets the bits in MII_CTRL1000 to
> > stop advertising 1Gbps and retries the negotiation at 100Mbps.
> > 
> I see that Russell answered already. My 2cts:
> 
> Are you sure all PHY's supporting downshift adjust the
> advertisement bits? IIRC an Aquantia PHY I dealt with does not.
> And if a PHY does so I'd consider this problematic:
> Let's say you have a broken cable and the PHY downshifts to
> 100Mbps. If you change the cable then the PHY would still negotiate
> 100Mbps only.

From what I've seen, that is not how downshift works, at least on
the PHYs I've seen.

When the PHY downshifts, it modifies the advertisement registers,
but it also remembers the original value. When the cable is
unplugged, it restores the setting to what was previously set.

It is _far_ from nice, but the fact is that your patch that Antonio
identified has broken previously working support, something that I
brought up when I patched one of the PHY drivers that was broken by
this very same problem by your patch.

That said, _if_ the PHY has a way to read the resolved state rather
than reading the advertisement registers, that is what should be
used (as I said previously) rather than trying to decode the
advertisement registers ourselves. That is normally more reliable
for speed and duplex.
Antonio Borneo Nov. 24, 2020, 3:17 p.m. UTC | #4
On Tue, 2020-11-24 at 14:56 +0000, Russell King - ARM Linux admin wrote:
> On Tue, Nov 24, 2020 at 03:38:48PM +0100, Antonio Borneo wrote:
> > If the auto-negotiation fails to establish a gigabit link, the phy
> > can try to 'down-shift': it resets the bits in MII_CTRL1000 to
> > stop advertising 1Gbps and retries the negotiation at 100Mbps.
> > 
> > From commit 5502b218e001 ("net: phy: use phy_resolve_aneg_linkmode
> > in genphy_read_status") the content of MII_CTRL1000 is not checked
> > anymore at the end of the negotiation, preventing the detection of
> > phy 'down-shift'.
> > In case of 'down-shift' phydev->advertising gets out-of-sync wrt
> > MII_CTRL1000 and still includes modes that the phy have already
> > dropped. The link partner could still advertise higher speeds,
> > while the link is established at one of the common lower speeds.
> > The logic 'and' in phy_resolve_aneg_linkmode() between
> > phydev->advertising and phydev->lp_advertising will report an
> > incorrect mode.
> > 
> > Issue detected with a local phy rtl8211f connected with a gigabit
> > capable router through a two-pairs network cable.
> > 
> > After auto-negotiation, read back MII_CTRL1000 and mask-out from
> > phydev->advertising the modes that have been eventually discarded
> > due to the 'down-shift'.
> 
> Sorry, but no. While your solution will appear to work, in
> introduces unexpected changes to the user visible APIs.
> 
> >  	if (phydev->autoneg == AUTONEG_ENABLE && phydev->autoneg_complete) {
> > +		if (phydev->is_gigabit_capable) {
> > +			adv = phy_read(phydev, MII_CTRL1000);
> > +			if (adv < 0)
> > +				return adv;
> > +			/* update advertising in case of 'down-shift' */
> > +			mii_ctrl1000_mod_linkmode_adv_t(phydev->advertising,
> > +							adv);
> 
> If a down-shift occurs, this will cause the configured advertising
> mask to lose the 1G speed, which will be visible to userspace.

You are right, it gets propagated to user that 1Gbps is not advertised

> Userspace doesn't expect the advertising mask to change beneath it.
> Since updates from userspace are done using a read-modify-write of
> the ksettings, this can have the undesired effect of removing 1G
> from the configured advertising mask.
> 
> We've had other PHYs have this behaviour; the correct solution is for
> the PHY driver to implement reading the resolution from the PHY rather
> than relying on the generic implementation if it can down-shift

If it's already upstream, could you please point to one of the phy driver
that already implements this properly?

Thanks
Antonio
Heiner Kallweit Nov. 24, 2020, 3:26 p.m. UTC | #5
Am 24.11.2020 um 16:17 schrieb Antonio Borneo:
> On Tue, 2020-11-24 at 14:56 +0000, Russell King - ARM Linux admin wrote:
>> On Tue, Nov 24, 2020 at 03:38:48PM +0100, Antonio Borneo wrote:
>>> If the auto-negotiation fails to establish a gigabit link, the phy
>>> can try to 'down-shift': it resets the bits in MII_CTRL1000 to
>>> stop advertising 1Gbps and retries the negotiation at 100Mbps.
>>>
>>> From commit 5502b218e001 ("net: phy: use phy_resolve_aneg_linkmode
>>> in genphy_read_status") the content of MII_CTRL1000 is not checked
>>> anymore at the end of the negotiation, preventing the detection of
>>> phy 'down-shift'.
>>> In case of 'down-shift' phydev->advertising gets out-of-sync wrt
>>> MII_CTRL1000 and still includes modes that the phy have already
>>> dropped. The link partner could still advertise higher speeds,
>>> while the link is established at one of the common lower speeds.
>>> The logic 'and' in phy_resolve_aneg_linkmode() between
>>> phydev->advertising and phydev->lp_advertising will report an
>>> incorrect mode.
>>>
>>> Issue detected with a local phy rtl8211f connected with a gigabit
>>> capable router through a two-pairs network cable.
>>>
>>> After auto-negotiation, read back MII_CTRL1000 and mask-out from
>>> phydev->advertising the modes that have been eventually discarded
>>> due to the 'down-shift'.
>>
>> Sorry, but no. While your solution will appear to work, in
>> introduces unexpected changes to the user visible APIs.
>>
>>>  	if (phydev->autoneg == AUTONEG_ENABLE && phydev->autoneg_complete) {
>>> +		if (phydev->is_gigabit_capable) {
>>> +			adv = phy_read(phydev, MII_CTRL1000);
>>> +			if (adv < 0)
>>> +				return adv;
>>> +			/* update advertising in case of 'down-shift' */
>>> +			mii_ctrl1000_mod_linkmode_adv_t(phydev->advertising,
>>> +							adv);
>>
>> If a down-shift occurs, this will cause the configured advertising
>> mask to lose the 1G speed, which will be visible to userspace.
> 
> You are right, it gets propagated to user that 1Gbps is not advertised
> 
>> Userspace doesn't expect the advertising mask to change beneath it.
>> Since updates from userspace are done using a read-modify-write of
>> the ksettings, this can have the undesired effect of removing 1G
>> from the configured advertising mask.
>>
>> We've had other PHYs have this behaviour; the correct solution is for
>> the PHY driver to implement reading the resolution from the PHY rather
>> than relying on the generic implementation if it can down-shift
> 
> If it's already upstream, could you please point to one of the phy driver
> that already implements this properly?
> 

See e.g. aqr107_read_rate(), used by aqr107_read_status().

> Thanks
> Antonio
>
Antonio Borneo Nov. 24, 2020, 3:31 p.m. UTC | #6
On Tue, 2020-11-24 at 15:17 +0000, Russell King - ARM Linux admin wrote:
> On Tue, Nov 24, 2020 at 04:03:40PM +0100, Heiner Kallweit wrote:
> > Am 24.11.2020 um 15:38 schrieb Antonio Borneo:
> > > If the auto-negotiation fails to establish a gigabit link, the phy
> > > can try to 'down-shift': it resets the bits in MII_CTRL1000 to
> > > stop advertising 1Gbps and retries the negotiation at 100Mbps.
> > > 
> > I see that Russell answered already. My 2cts:
> > 
> > Are you sure all PHY's supporting downshift adjust the
> > advertisement bits? IIRC an Aquantia PHY I dealt with does not.
> > And if a PHY does so I'd consider this problematic:
> > Let's say you have a broken cable and the PHY downshifts to
> > 100Mbps. If you change the cable then the PHY would still negotiate
> > 100Mbps only.
> 
> From what I've seen, that is not how downshift works, at least on
> the PHYs I've seen.
> 
> When the PHY downshifts, it modifies the advertisement registers,
> but it also remembers the original value. When the cable is
> unplugged, it restores the setting to what was previously set.

In fact, at least rtl8211f is able to recover the original settings and
returns to 1Gbps once a decent cable gets plugged-in.

> 
> It is _far_ from nice, but the fact is that your patch that Antonio
> identified has broken previously working support, something that I
> brought up when I patched one of the PHY drivers that was broken by
> this very same problem by your patch.

The idea to fix it for a general case was indeed triggered by the fact that
before commit 5502b218e001 this was the norm. I considered it as a
regression.

> 
> That said, _if_ the PHY has a way to read the resolved state rather
> than reading the advertisement registers, that is what should be
> used (as I said previously) rather than trying to decode the
> advertisement registers ourselves. That is normally more reliable
> for speed and duplex.
> 

Wrt rtl8211f I don't have info other then the public datasheet, and there I
didn't found any way other than reading the advertisement register.

I have read the latest comment from Heiner. I will check aqr107!

Thanks
Antonio
Russell King (Oracle) Nov. 24, 2020, 3:37 p.m. UTC | #7
On Tue, Nov 24, 2020 at 04:17:42PM +0100, Antonio Borneo wrote:
> On Tue, 2020-11-24 at 14:56 +0000, Russell King - ARM Linux admin wrote:
> > Userspace doesn't expect the advertising mask to change beneath it.
> > Since updates from userspace are done using a read-modify-write of
> > the ksettings, this can have the undesired effect of removing 1G
> > from the configured advertising mask.
> > 
> > We've had other PHYs have this behaviour; the correct solution is for
> > the PHY driver to implement reading the resolution from the PHY rather
> > than relying on the generic implementation if it can down-shift
> 
> If it's already upstream, could you please point to one of the phy driver
> that already implements this properly?

Reading the resolved information is PHY specific as it isn't
standardised.

Marvell PHYs have read the resolved information for a very long time.
I added support for it to at803x.c:

06d5f3441b2e net: phy: at803x: use operating parameters from PHY-specific status

after it broke for exactly the reason you're reporting for your PHY.
David Laight Nov. 24, 2020, 3:46 p.m. UTC | #8
From: Russell King
> Sent: 24 November 2020 15:17
...
> That said, _if_ the PHY has a way to read the resolved state rather
> than reading the advertisement registers, that is what should be
> used (as I said previously) rather than trying to decode the
> advertisement registers ourselves. That is normally more reliable
> for speed and duplex.

Determining the speed and duplux from the ANAR and ANRR (I can't
remember the name of the response register) has always been
completely broken.

The problems arise when you connect to either a 10M hub or
a 10/100M autodetecting hub (these are a 10M hub and a 100M hub
connected by a bridge).
The PHY will either see single link test pulses (10M hub) or
a simple burst of link test pulses (10/100 hub) and fall back
to 10M HDX or 100M HDX.
Both the 10M hub and 10/100 hub are happy with the link test
pulse stream that contains the ANAR.
However the ANRR register will (typically) contain the value
from the last system that sent it one.
So if you unplug from something that does 100M FDX and plug into
a hub the MAC unit is likely to be misconfigured and do FDX.

Of course, there is no generic way to get the actual mode.
I'm not sure the PHY I was using (a long time ago) even had
any private register that could tell you.

For one system (which was never going to do anything fast)
I just removed the FDX modes from the ANAR.
The MAC didn't care whether it was 10M or 100M.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Antonio Borneo Nov. 24, 2020, 5 p.m. UTC | #9
On Tue, 2020-11-24 at 15:37 +0000, Russell King - ARM Linux admin wrote:
> On Tue, Nov 24, 2020 at 04:17:42PM +0100, Antonio Borneo wrote:
> > On Tue, 2020-11-24 at 14:56 +0000, Russell King - ARM Linux admin wrote:
> > > Userspace doesn't expect the advertising mask to change beneath it.
> > > Since updates from userspace are done using a read-modify-write of
> > > the ksettings, this can have the undesired effect of removing 1G
> > > from the configured advertising mask.
> > > 
> > > We've had other PHYs have this behaviour; the correct solution is for
> > > the PHY driver to implement reading the resolution from the PHY rather
> > > than relying on the generic implementation if it can down-shift
> > 
> > If it's already upstream, could you please point to one of the phy driver
> > that already implements this properly?
> 
> Reading the resolved information is PHY specific as it isn't
> standardised.

Digging in the info you have provided, I realized that another Realtek PHY
has some specific code already upstream to deal with downshift.
The PHY specific code is added by Heiner in d445dff2df60 ("net: phy:
realtek: read actual speed to detect downshift").
This code reads the actual speed from page 0xa43 address 0x12, that is not
reported in the datasheet of rtl8211f.
But I checked the register content in rtl8211f and it works at the same way
too!

I have added Willy in copy; maybe he can confirm that we can use page 0xa43
address 0x12 on rtl8211f to read the actual speed after negotiation.

In such case the fix for rtl8211f requires just adding the same custom
read_status().

Antonio
diff mbox series

Patch

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 5dab6be6fc38..5d1060aa1b25 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -2331,7 +2331,7 @@  EXPORT_SYMBOL(genphy_read_status_fixed);
  */
 int genphy_read_status(struct phy_device *phydev)
 {
-	int err, old_link = phydev->link;
+	int adv, err, old_link = phydev->link;
 
 	/* Update the link, but return if there was an error */
 	err = genphy_update_link(phydev);
@@ -2356,6 +2356,14 @@  int genphy_read_status(struct phy_device *phydev)
 		return err;
 
 	if (phydev->autoneg == AUTONEG_ENABLE && phydev->autoneg_complete) {
+		if (phydev->is_gigabit_capable) {
+			adv = phy_read(phydev, MII_CTRL1000);
+			if (adv < 0)
+				return adv;
+			/* update advertising in case of 'down-shift' */
+			mii_ctrl1000_mod_linkmode_adv_t(phydev->advertising,
+							adv);
+		}
 		phy_resolve_aneg_linkmode(phydev);
 	} else if (phydev->autoneg == AUTONEG_DISABLE) {
 		err = genphy_read_status_fixed(phydev);