diff mbox series

Regression: mv88e6xxx packet loss after 4.18's PHYLINK merge

Message ID 20190114113444.GB16632@samu-ThinkPad-T480s
State RFC
Delegated to: David Miller
Headers show
Series Regression: mv88e6xxx packet loss after 4.18's PHYLINK merge | expand

Commit Message

Samu Nuutamo Jan. 14, 2019, 11:34 a.m. UTC
Hi,

We have an imx6q-b450v3 board that has one of the switch ports configured as a
fixed link. After upgrading the kernel to version 4.18 the link has experienced
a small amount of packet loss, around 0.15%.

The issue was bisected to a commit: aab9c4067d23 net: dsa: Plug in PHYLINK support

After enabling debug we could see that the new phylink code causes the link to
reset once every second:

[  309.992368] mv88e6085 gpio-0:00 enembc: phylink_mac_config: mode=fixed//100Mbps/Full adv=00000,00000208 pause=10 link=1 an=1
[  309.998451] mv88e6085 gpio-0:00: p5: Force link down
[  309.998869] mv88e6085 gpio-0:00: p5: Speed set to 100 Mbps
[  309.999280] mv88e6085 gpio-0:00: p5: Force full duplex
[  309.999895] mv88e6085 gpio-0:00: p5: Force link up
[  311.032400] mv88e6085 gpio-0:00 enembc: phylink_mac_config: mode=fixed//100Mbps/Full adv=00000,00000208 pause=10 link=1 an=1
[  311.038248] mv88e6085 gpio-0:00: p5: Force link down
[  311.038660] mv88e6085 gpio-0:00: p5: Speed set to 100 Mbps
[  311.039069] mv88e6085 gpio-0:00: p5: Force full duplex
[  311.039678] mv88e6085 gpio-0:00: p5: Force link up
[  312.072328] mv88e6085 gpio-0:00 enembc: phylink_mac_config: mode=fixed//100Mbps/Full adv=00000,00000208 pause=10 link=1 an=1
[  312.077603] mv88e6085 gpio-0:00: p5: Force link down
[  312.078010] mv88e6085 gpio-0:00: p5: Speed set to 100 Mbps
[  312.078417] mv88e6085 gpio-0:00: p5: Force full duplex
[  312.079026] mv88e6085 gpio-0:00: p5: Force link up

It's probably not a coincidence that the link downtime comes out as 0.15%
matching the packet loss percentage. I patched the driver code so that the
reset is skipped if the link state is already set:



The patched version did not have any packet loss, but it's probably not the
correct way of fixing the issue.


Best regards,
Samu Nuutamo

Comments

Andrew Lunn Jan. 14, 2019, 2:23 p.m. UTC | #1
On Mon, Jan 14, 2019 at 01:34:44PM +0200, Samu Nuutamo wrote:
> Hi,
> 
> We have an imx6q-b450v3 board that has one of the switch ports configured as a
> fixed link. After upgrading the kernel to version 4.18 the link has experienced
> a small amount of packet loss, around 0.15%.
> 
> The issue was bisected to a commit: aab9c4067d23 net: dsa: Plug in PHYLINK support
> 
> After enabling debug we could see that the new phylink code causes the link to
> reset once every second:
> 
> [  309.992368] mv88e6085 gpio-0:00 enembc: phylink_mac_config: mode=fixed//100Mbps/Full adv=00000,00000208 pause=10 link=1 an=1
> [  309.998451] mv88e6085 gpio-0:00: p5: Force link down
> [  309.998869] mv88e6085 gpio-0:00: p5: Speed set to 100 Mbps
> [  309.999280] mv88e6085 gpio-0:00: p5: Force full duplex
> [  309.999895] mv88e6085 gpio-0:00: p5: Force link up
> [  311.032400] mv88e6085 gpio-0:00 enembc: phylink_mac_config: mode=fixed//100Mbps/Full adv=00000,00000208 pause=10 link=1 an=1
> [  311.038248] mv88e6085 gpio-0:00: p5: Force link down
> [  311.038660] mv88e6085 gpio-0:00: p5: Speed set to 100 Mbps
> [  311.039069] mv88e6085 gpio-0:00: p5: Force full duplex
> [  311.039678] mv88e6085 gpio-0:00: p5: Force link up
> [  312.072328] mv88e6085 gpio-0:00 enembc: phylink_mac_config: mode=fixed//100Mbps/Full adv=00000,00000208 pause=10 link=1 an=1
> [  312.077603] mv88e6085 gpio-0:00: p5: Force link down
> [  312.078010] mv88e6085 gpio-0:00: p5: Speed set to 100 Mbps
> [  312.078417] mv88e6085 gpio-0:00: p5: Force full duplex
> [  312.079026] mv88e6085 gpio-0:00: p5: Force link up

Hi Samu

Thanks for the report.

I think we should be fixing this from the other end. Why does PHYLINK
reset the link once per second? If you have time, please could you
investigate that.

	    Thanks
		Andrew
Florian Fainelli Jan. 14, 2019, 6:03 p.m. UTC | #2
On 1/14/19 6:23 AM, Andrew Lunn wrote:
> On Mon, Jan 14, 2019 at 01:34:44PM +0200, Samu Nuutamo wrote:
>> Hi,
>>
>> We have an imx6q-b450v3 board that has one of the switch ports configured as a
>> fixed link. After upgrading the kernel to version 4.18 the link has experienced
>> a small amount of packet loss, around 0.15%.
>>
>> The issue was bisected to a commit: aab9c4067d23 net: dsa: Plug in PHYLINK support
>>
>> After enabling debug we could see that the new phylink code causes the link to
>> reset once every second:
>>
>> [  309.992368] mv88e6085 gpio-0:00 enembc: phylink_mac_config: mode=fixed//100Mbps/Full adv=00000,00000208 pause=10 link=1 an=1
>> [  309.998451] mv88e6085 gpio-0:00: p5: Force link down
>> [  309.998869] mv88e6085 gpio-0:00: p5: Speed set to 100 Mbps
>> [  309.999280] mv88e6085 gpio-0:00: p5: Force full duplex
>> [  309.999895] mv88e6085 gpio-0:00: p5: Force link up
>> [  311.032400] mv88e6085 gpio-0:00 enembc: phylink_mac_config: mode=fixed//100Mbps/Full adv=00000,00000208 pause=10 link=1 an=1
>> [  311.038248] mv88e6085 gpio-0:00: p5: Force link down
>> [  311.038660] mv88e6085 gpio-0:00: p5: Speed set to 100 Mbps
>> [  311.039069] mv88e6085 gpio-0:00: p5: Force full duplex
>> [  311.039678] mv88e6085 gpio-0:00: p5: Force link up
>> [  312.072328] mv88e6085 gpio-0:00 enembc: phylink_mac_config: mode=fixed//100Mbps/Full adv=00000,00000208 pause=10 link=1 an=1
>> [  312.077603] mv88e6085 gpio-0:00: p5: Force link down
>> [  312.078010] mv88e6085 gpio-0:00: p5: Speed set to 100 Mbps
>> [  312.078417] mv88e6085 gpio-0:00: p5: Force full duplex
>> [  312.079026] mv88e6085 gpio-0:00: p5: Force link up
> 
> Hi Samu
> 
> Thanks for the report.
> 
> I think we should be fixing this from the other end. Why does PHYLINK
> reset the link once per second? If you have time, please could you
> investigate that.

Samu, is this fixed link using a GPIO to determine the link state? Does
this patch help at all?

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c
b/drivers/net/dsa/mv88e6xxx/chip.c
index 8a517d8fb9d1..9dedb9b22c13 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -721,7 +721,7 @@ static void mv88e6xxx_mac_config(struct dsa_switch
*ds, int port,
                return;

        if (mode == MLO_AN_FIXED) {
-               link = LINK_FORCED_UP;
+               link = state->link;
                speed = state->speed;
                duplex = state->duplex;
        } else if (!mv88e6xxx_phy_is_internal(ds, port)) {
Samu Nuutamo Jan. 15, 2019, 8:15 a.m. UTC | #3
On Mon, Jan 14, 2019 at 10:03:57AM -0800, Florian Fainelli wrote:
> Samu, is this fixed link using a GPIO to determine the link state? Does
> this patch help at all?
> 
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c
> b/drivers/net/dsa/mv88e6xxx/chip.c
> index 8a517d8fb9d1..9dedb9b22c13 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.c
> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> @@ -721,7 +721,7 @@ static void mv88e6xxx_mac_config(struct dsa_switch
> *ds, int port,
>                 return;
> 
>         if (mode == MLO_AN_FIXED) {
> -               link = LINK_FORCED_UP;
> +               link = state->link;
>                 speed = state->speed;
>                 duplex = state->duplex;
>         } else if (!mv88e6xxx_phy_is_internal(ds, port)) {

Hi Florian,

I'm not 100% sure but it looks like GPIO is not used for that purpose.
I tested your patch but unfortunately it didn't work as the link keeps
cycling down/up.

- Samu
Andrew Lunn Jan. 17, 2019, 2:29 a.m. UTC | #4
On Mon, Jan 14, 2019 at 01:34:44PM +0200, Samu Nuutamo wrote:
> Hi,
> 
> We have an imx6q-b450v3 board that has one of the switch ports configured as a
> fixed link. After upgrading the kernel to version 4.18 the link has experienced
> a small amount of packet loss, around 0.15%.
> 
> The issue was bisected to a commit: aab9c4067d23 net: dsa: Plug in PHYLINK support
> 
> After enabling debug we could see that the new phylink code causes the link to
> reset once every second:
> 
> [  309.992368] mv88e6085 gpio-0:00 enembc: phylink_mac_config: mode=fixed//100Mbps/Full adv=00000,00000208 pause=10 link=1 an=1
> [  309.998451] mv88e6085 gpio-0:00: p5: Force link down
> [  309.998869] mv88e6085 gpio-0:00: p5: Speed set to 100 Mbps
> [  309.999280] mv88e6085 gpio-0:00: p5: Force full duplex
> [  309.999895] mv88e6085 gpio-0:00: p5: Force link up
> [  311.032400] mv88e6085 gpio-0:00 enembc: phylink_mac_config: mode=fixed//100Mbps/Full adv=00000,00000208 pause=10 link=1 an=1
> [  311.038248] mv88e6085 gpio-0:00: p5: Force link down
> [  311.038660] mv88e6085 gpio-0:00: p5: Speed set to 100 Mbps
> [  311.039069] mv88e6085 gpio-0:00: p5: Force full duplex
> [  311.039678] mv88e6085 gpio-0:00: p5: Force link up
> [  312.072328] mv88e6085 gpio-0:00 enembc: phylink_mac_config: mode=fixed//100Mbps/Full adv=00000,00000208 pause=10 link=1 an=1
> [  312.077603] mv88e6085 gpio-0:00: p5: Force link down
> [  312.078010] mv88e6085 gpio-0:00: p5: Speed set to 100 Mbps
> [  312.078417] mv88e6085 gpio-0:00: p5: Force full duplex
> [  312.079026] mv88e6085 gpio-0:00: p5: Force link up

Hi Samu

Please could you try this patch. I've not had chance to properly test
it, and i'm about to go away for a long weekend.

Thanks
	Andrew

From 02438712824d3c7a9dd1aab8bd5dfb4383e55bfd Mon Sep 17 00:00:00 2001
From: Andrew Lunn <andrew@lunn.ch>
Date: Wed, 16 Jan 2019 20:22:04 -0600
Subject: [PATCH] net: phy: phylink: Only change mac when fixed link changes
 state

phylink polls the fixed-link once per second to see if the GPIO has
changed state, or if the callback indicates if there has been a change
in state. It then calls the MAC to reconfigure itself to the current
state.

For some MACs, reconfiguration can result in packets being dropped.
Hence keep track of the link state between polls and only reconfigure
the MAC if there has been a change of state.

Reported-by: Samu Nuutamo <samu.nuutamo@vincit.fi>
Fixes: 9cd00a8aa42e ("net: phy: phylink: Poll link GPIOs")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
---
 drivers/net/phy/phylink.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index e7becc7379d7..6473af228842 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -407,11 +407,14 @@ static void phylink_resolve(struct work_struct *w)
 			phylink_mac_config(pl, &link_state);
 			break;
 
-		case MLO_AN_FIXED:
-			phylink_get_fixed_state(pl, &link_state);
-			phylink_mac_config(pl, &link_state);
-			break;
+		case MLO_AN_FIXED: {
+			bool old_link = pl->phy_state.link;
 
+			phylink_get_fixed_state(pl, &pl->phy_state);
+			if (pl->phy_state.link != old_link)
+				phylink_mac_config(pl, &pl->phy_state);
+			break;
+		}
 		case MLO_AN_INBAND:
 			phylink_get_mac_state(pl, &link_state);
 			if (pl->phydev) {
Samu Nuutamo Jan. 17, 2019, 11:37 a.m. UTC | #5
On Thu, Jan 17, 2019 at 03:29:35AM +0100, Andrew Lunn wrote:
> On Mon, Jan 14, 2019 at 01:34:44PM +0200, Samu Nuutamo wrote:
> > Hi,
> > 
> > We have an imx6q-b450v3 board that has one of the switch ports configured as a
> > fixed link. After upgrading the kernel to version 4.18 the link has experienced
> > a small amount of packet loss, around 0.15%.
> > 
> > The issue was bisected to a commit: aab9c4067d23 net: dsa: Plug in PHYLINK support
> > 
> > After enabling debug we could see that the new phylink code causes the link to
> > reset once every second:
> > 
> > [  309.992368] mv88e6085 gpio-0:00 enembc: phylink_mac_config: mode=fixed//100Mbps/Full adv=00000,00000208 pause=10 link=1 an=1
> > [  309.998451] mv88e6085 gpio-0:00: p5: Force link down
> > [  309.998869] mv88e6085 gpio-0:00: p5: Speed set to 100 Mbps
> > [  309.999280] mv88e6085 gpio-0:00: p5: Force full duplex
> > [  309.999895] mv88e6085 gpio-0:00: p5: Force link up
> > [  311.032400] mv88e6085 gpio-0:00 enembc: phylink_mac_config: mode=fixed//100Mbps/Full adv=00000,00000208 pause=10 link=1 an=1
> > [  311.038248] mv88e6085 gpio-0:00: p5: Force link down
> > [  311.038660] mv88e6085 gpio-0:00: p5: Speed set to 100 Mbps
> > [  311.039069] mv88e6085 gpio-0:00: p5: Force full duplex
> > [  311.039678] mv88e6085 gpio-0:00: p5: Force link up
> > [  312.072328] mv88e6085 gpio-0:00 enembc: phylink_mac_config: mode=fixed//100Mbps/Full adv=00000,00000208 pause=10 link=1 an=1
> > [  312.077603] mv88e6085 gpio-0:00: p5: Force link down
> > [  312.078010] mv88e6085 gpio-0:00: p5: Speed set to 100 Mbps
> > [  312.078417] mv88e6085 gpio-0:00: p5: Force full duplex
> > [  312.079026] mv88e6085 gpio-0:00: p5: Force link up
> 
> Hi Samu
> 
> Please could you try this patch. I've not had chance to properly test
> it, and i'm about to go away for a long weekend.
> 
> Thanks
> 	Andrew
> 
> From 02438712824d3c7a9dd1aab8bd5dfb4383e55bfd Mon Sep 17 00:00:00 2001
> From: Andrew Lunn <andrew@lunn.ch>
> Date: Wed, 16 Jan 2019 20:22:04 -0600
> Subject: [PATCH] net: phy: phylink: Only change mac when fixed link changes
>  state
> 
> phylink polls the fixed-link once per second to see if the GPIO has
> changed state, or if the callback indicates if there has been a change
> in state. It then calls the MAC to reconfigure itself to the current
> state.
> 
> For some MACs, reconfiguration can result in packets being dropped.
> Hence keep track of the link state between polls and only reconfigure
> the MAC if there has been a change of state.
> 
> Reported-by: Samu Nuutamo <samu.nuutamo@vincit.fi>
> Fixes: 9cd00a8aa42e ("net: phy: phylink: Poll link GPIOs")
> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
> ---
>  drivers/net/phy/phylink.c | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> index e7becc7379d7..6473af228842 100644
> --- a/drivers/net/phy/phylink.c
> +++ b/drivers/net/phy/phylink.c
> @@ -407,11 +407,14 @@ static void phylink_resolve(struct work_struct *w)
>  			phylink_mac_config(pl, &link_state);
>  			break;
>  
> -		case MLO_AN_FIXED:
> -			phylink_get_fixed_state(pl, &link_state);
> -			phylink_mac_config(pl, &link_state);
> -			break;
> +		case MLO_AN_FIXED: {
> +			bool old_link = pl->phy_state.link;
>  
> +			phylink_get_fixed_state(pl, &pl->phy_state);
> +			if (pl->phy_state.link != old_link)
> +				phylink_mac_config(pl, &pl->phy_state);
> +			break;
> +		}
>  		case MLO_AN_INBAND:
>  			phylink_get_mac_state(pl, &link_state);
>  			if (pl->phydev) {
> -- 
> 2.20.1
> 

Hi Andrew,

I tested the patch and it has an issue that prevents the carrier from turning
on. I think it's caused by the change from using link_state to phy->phy_state,
as later in the same function the link_state attributes are read when
deciding if the carrier needs to be turned on.


- Samu
Samu Nuutamo Jan. 17, 2019, 3:46 p.m. UTC | #6
On Thu, Jan 17, 2019 at 03:29:35AM +0100, Andrew Lunn wrote:
> Hi Samu
> 
> Please could you try this patch. I've not had chance to properly test
> it, and i'm about to go away for a long weekend.
> 
> Thanks
> 	Andrew
> 
> From 02438712824d3c7a9dd1aab8bd5dfb4383e55bfd Mon Sep 17 00:00:00 2001
> From: Andrew Lunn <andrew@lunn.ch>
> Date: Wed, 16 Jan 2019 20:22:04 -0600
> Subject: [PATCH] net: phy: phylink: Only change mac when fixed link changes
>  state
> 
> phylink polls the fixed-link once per second to see if the GPIO has
> changed state, or if the callback indicates if there has been a change
> in state. It then calls the MAC to reconfigure itself to the current
> state.
> 
> For some MACs, reconfiguration can result in packets being dropped.
> Hence keep track of the link state between polls and only reconfigure
> the MAC if there has been a change of state.
> 
> Reported-by: Samu Nuutamo <samu.nuutamo@vincit.fi>
> Fixes: 9cd00a8aa42e ("net: phy: phylink: Poll link GPIOs")
> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
> ---
>  drivers/net/phy/phylink.c | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> index e7becc7379d7..6473af228842 100644
> --- a/drivers/net/phy/phylink.c
> +++ b/drivers/net/phy/phylink.c
> @@ -407,11 +407,14 @@ static void phylink_resolve(struct work_struct *w)
>  			phylink_mac_config(pl, &link_state);
>  			break;
>  
> -		case MLO_AN_FIXED:
> -			phylink_get_fixed_state(pl, &link_state);
> -			phylink_mac_config(pl, &link_state);
> -			break;
> +		case MLO_AN_FIXED: {
> +			bool old_link = pl->phy_state.link;
>  
> +			phylink_get_fixed_state(pl, &pl->phy_state);
> +			if (pl->phy_state.link != old_link)
> +				phylink_mac_config(pl, &pl->phy_state);
> +			break;
> +		}
>  		case MLO_AN_INBAND:
>  			phylink_get_mac_state(pl, &link_state);
>  			if (pl->phydev) {
> -- 
> 2.20.1
> 

Hi Andrew,

I've debugged the issue further by dumping all the values inside
phylink_resolve to get a better understand how the link state behaves.

As this is a fixed link the link_state structure is populated by
phylink_get_fixed_state(), but in our case neither get_fixed_state callback
or GPIO is used. This means the link_state comes straight from the
link_config, meaning link_state.link will always be 1. On the other hand the
pl->phy_state seems to never change and the phy_state.link stays 0 even
when the link is up and functional.

This makes it impossible to use these variables for deciding if
phylink_mac_config needs to be run, and this got me thinking: do we even need
to reconfigure the link? The phylink_mac_config() is already called from
phylink_start and fixed links shouldn't change(?).

I created the following patch that simply removes the phylink_mac_config() call
and everything seems to be working. The patch was only tested with the board
we have on hand.

- Samu

From 6786cc3b9fef40adb3f68c72236220fafaa26eee Mon Sep 17 00:00:00 2001
From: Samu Nuutamo <samu.nuutamo@vincit.fi>
Date: Thu, 17 Jan 2019 16:42:29 +0200
Subject: [PATCH] net: phy: phylink: Configure fixed links only once

This prevents unnecessary link restarts from causing packet loss.

Fixes: 9cd00a8aa42e ("net: phy: phylink: Poll link GPIOs")
Signed-off-by: Samu Nuutamo <samu.nuutamo@vincit.fi>
---
 drivers/net/phy/phylink.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 70f3f90c2ed6..1ef76382c593 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -437,7 +437,6 @@ static void phylink_resolve(struct work_struct *w)

        case MLO_AN_FIXED:
            phylink_get_fixed_state(pl, &link_state);
-           phylink_mac_config(pl, &link_state);
            break;

        case MLO_AN_INBAND:
--
2.17.1
Andrew Lunn Jan. 21, 2019, 5:52 p.m. UTC | #7
> Hi Andrew,
> 
> I've debugged the issue further by dumping all the values inside
> phylink_resolve to get a better understand how the link state behaves.
> 
> As this is a fixed link the link_state structure is populated by
> phylink_get_fixed_state(), but in our case neither get_fixed_state callback
> or GPIO is used. This means the link_state comes straight from the
> link_config, meaning link_state.link will always be 1. On the other hand the
> pl->phy_state seems to never change and the phy_state.link stays 0 even
> when the link is up and functional.
> 
> This makes it impossible to use these variables for deciding if
> phylink_mac_config needs to be run, and this got me thinking: do we even need
> to reconfigure the link? The phylink_mac_config() is already called from
> phylink_start and fixed links shouldn't change(?).

Fixed links do change, they can read a GPIO to tell you if the link is
up/down, and there is a callback which can be used to indicate if the
link is up or down. If you remove that call as suggested, you break a
number of boards.

The state tracking has to be made to work, so that
phylink_mac_config() is called once on state change, and not more.

     Andrew
Andrew Lunn Jan. 23, 2019, 2:10 a.m. UTC | #8
> Hi Andrew,
> 
> I tested the patch and it has an issue that prevents the carrier from turning
> on. I think it's caused by the change from using link_state to phy->phy_state,
> as later in the same function the link_state attributes are read when
> deciding if the carrier needs to be turned on.
 
Hi Samu

This version has had better testing. So i'm a bit more confident about
it.

	Andrew

From d093492f9aba75cb4844d15148e0efd014f193bb Mon Sep 17 00:00:00 2001
From: Andrew Lunn <andrew@lunn.ch>
Date: Wed, 16 Jan 2019 20:22:04 -0600
Subject: [PATCH] net: phy: phylink: Only change mac when fixed link changes
 state

phylink polls the fixed-link once per second to see if the GPIO has
changed state, or if the callback indicates if there has been a change
in state. It then calls the MAC to reconfigure itself to the current
state.

For some MACs, reconfiguration can result in packets being dropped.
Hence only reconfigure the MAC if the link state differs from the
netdev carrier state.

Reported-by: Samu Nuutamo <samu.nuutamo@vincit.fi>
Fixes: 9cd00a8aa42e ("net: phy: phylink: Poll link GPIOs")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
---
 drivers/net/phy/phylink.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index e7becc7379d7..394b1eb5b55a 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -409,7 +409,8 @@ static void phylink_resolve(struct work_struct *w)
 
 		case MLO_AN_FIXED:
 			phylink_get_fixed_state(pl, &link_state);
-			phylink_mac_config(pl, &link_state);
+			if (link_state.link != netif_carrier_ok(ndev))
+				phylink_mac_config(pl, &link_state);
 			break;
 
 		case MLO_AN_INBAND:
Samu Nuutamo Jan. 23, 2019, 7:16 a.m. UTC | #9
On Wed, Jan 23, 2019 at 03:10:32AM +0100, Andrew Lunn wrote:
> > Hi Andrew,
> > 
> > I tested the patch and it has an issue that prevents the carrier from turning
> > on. I think it's caused by the change from using link_state to phy->phy_state,
> > as later in the same function the link_state attributes are read when
> > deciding if the carrier needs to be turned on.
>  
> Hi Samu
> 
> This version has had better testing. So i'm a bit more confident about
> it.
> 
> 	Andrew
> 
> From d093492f9aba75cb4844d15148e0efd014f193bb Mon Sep 17 00:00:00 2001
> From: Andrew Lunn <andrew@lunn.ch>
> Date: Wed, 16 Jan 2019 20:22:04 -0600
> Subject: [PATCH] net: phy: phylink: Only change mac when fixed link changes
>  state
> 
> phylink polls the fixed-link once per second to see if the GPIO has
> changed state, or if the callback indicates if there has been a change
> in state. It then calls the MAC to reconfigure itself to the current
> state.
> 
> For some MACs, reconfiguration can result in packets being dropped.
> Hence only reconfigure the MAC if the link state differs from the
> netdev carrier state.
> 
> Reported-by: Samu Nuutamo <samu.nuutamo@vincit.fi>
> Fixes: 9cd00a8aa42e ("net: phy: phylink: Poll link GPIOs")
> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
> ---
>  drivers/net/phy/phylink.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> index e7becc7379d7..394b1eb5b55a 100644
> --- a/drivers/net/phy/phylink.c
> +++ b/drivers/net/phy/phylink.c
> @@ -409,7 +409,8 @@ static void phylink_resolve(struct work_struct *w)
>  
>  		case MLO_AN_FIXED:
>  			phylink_get_fixed_state(pl, &link_state);
> -			phylink_mac_config(pl, &link_state);
> +			if (link_state.link != netif_carrier_ok(ndev))
> +				phylink_mac_config(pl, &link_state);
>  			break;
>  
>  		case MLO_AN_INBAND:
> -- 
> 2.20.1
> 
Hi Andrew,

I did a quick test and this latest patch seems to work. Thank you.


- Samu
Florian Fainelli Jan. 25, 2019, 7 p.m. UTC | #10
On 1/22/19 6:10 PM, Andrew Lunn wrote:
>> Hi Andrew,
>>
>> I tested the patch and it has an issue that prevents the carrier from turning
>> on. I think it's caused by the change from using link_state to phy->phy_state,
>> as later in the same function the link_state attributes are read when
>> deciding if the carrier needs to be turned on.
>  
> Hi Samu
> 
> This version has had better testing. So i'm a bit more confident about
> it.
> 
> 	Andrew
> 
> From d093492f9aba75cb4844d15148e0efd014f193bb Mon Sep 17 00:00:00 2001
> From: Andrew Lunn <andrew@lunn.ch>
> Date: Wed, 16 Jan 2019 20:22:04 -0600
> Subject: [PATCH] net: phy: phylink: Only change mac when fixed link changes
>  state
> 
> phylink polls the fixed-link once per second to see if the GPIO has
> changed state, or if the callback indicates if there has been a change
> in state. It then calls the MAC to reconfigure itself to the current
> state.
> 
> For some MACs, reconfiguration can result in packets being dropped.
> Hence only reconfigure the MAC if the link state differs from the
> netdev carrier state.
> 
> Reported-by: Samu Nuutamo <samu.nuutamo@vincit.fi>
> Fixes: 9cd00a8aa42e ("net: phy: phylink: Poll link GPIOs")
> Signed-off-by: Andrew Lunn <andrew@lunn.ch>
> ---
>  drivers/net/phy/phylink.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> index e7becc7379d7..394b1eb5b55a 100644
> --- a/drivers/net/phy/phylink.c
> +++ b/drivers/net/phy/phylink.c
> @@ -409,7 +409,8 @@ static void phylink_resolve(struct work_struct *w)
>  
>  		case MLO_AN_FIXED:
>  			phylink_get_fixed_state(pl, &link_state);
> -			phylink_mac_config(pl, &link_state);
> +			if (link_state.link != netif_carrier_ok(ndev))
> +				phylink_mac_config(pl, &link_state);

With that change though, I don't think we would ever be able to disable
the MAC upon link down which could be sucking additional power, so maybe
we should be comparing link_state.link with the previous link_state
(prior to calling phylink_get_fixed_state(), so something like that:

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index e7becc7379d7..88e0045ecf79 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -409,7 +409,8 @@ static void phylink_resolve(struct work_struct *w)

                case MLO_AN_FIXED:
                        phylink_get_fixed_state(pl, &link_state);
-                       phylink_mac_config(pl, &link_state);
+                       if (pl->mac_link_dropped != link_state.link)
+                               phylink_mac_config(pl, &link_state);
                        break;

                case MLO_AN_INBAND:


It seems to me that the problem might very well be mv88e6xxx specific in
that the function mv88e6xxx_port_setup_mac() calls into multiple other
functions in order to configure the port's MAC and that is not atomic
from the HW's perspective, therefore there is a chance for the HW to
latch in some register writes (even if they are the same values) trigger
a MAC reconfiguration upon that write, which is responsible for causing
some packets to be dropped.
Russell King (Oracle) Jan. 25, 2019, 7:15 p.m. UTC | #11
On Fri, Jan 25, 2019 at 11:00:46AM -0800, Florian Fainelli wrote:
> > +++ b/drivers/net/phy/phylink.c
> > @@ -409,7 +409,8 @@ static void phylink_resolve(struct work_struct *w)
> >  
> >  		case MLO_AN_FIXED:
> >  			phylink_get_fixed_state(pl, &link_state);
> > -			phylink_mac_config(pl, &link_state);
> > +			if (link_state.link != netif_carrier_ok(ndev))
> > +				phylink_mac_config(pl, &link_state);
> 
> With that change though, I don't think we would ever be able to disable
> the MAC upon link down which could be sucking additional power, so maybe
> we should be comparing link_state.link with the previous link_state
> (prior to calling phylink_get_fixed_state(), so something like that:
> 
> diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> index e7becc7379d7..88e0045ecf79 100644
> --- a/drivers/net/phy/phylink.c
> +++ b/drivers/net/phy/phylink.c
> @@ -409,7 +409,8 @@ static void phylink_resolve(struct work_struct *w)
> 
>                 case MLO_AN_FIXED:
>                         phylink_get_fixed_state(pl, &link_state);
> -                       phylink_mac_config(pl, &link_state);
> +                       if (pl->mac_link_dropped != link_state.link)
> +                               phylink_mac_config(pl, &link_state);
>                         break;
> 
>                 case MLO_AN_INBAND:
> 
> 
> It seems to me that the problem might very well be mv88e6xxx specific in
> that the function mv88e6xxx_port_setup_mac() calls into multiple other
> functions in order to configure the port's MAC and that is not atomic
> from the HW's perspective, therefore there is a chance for the HW to
> latch in some register writes (even if they are the same values) trigger
> a MAC reconfiguration upon that write, which is responsible for causing
> some packets to be dropped.

The mac_config is expected to _update_ the MAC, not reprogram it in
full each time it's called.

Consider a SGMII link with a PHY on a SFP module.  The SGMII link
must be placed in "negotiation" mode (so must use in-band negotiation)
but there is no mechanism on the link to convey the negotiated flow
control settings.  So, mac_config() must be called whenever the link
changes to update the MAC with the flow control settings read from
the PHY.  However, the MAC still needs negotiation enabled on the
link, and must not disturb that, because it could cause the PHY to
then report link-down.  That would throw phylink into a continual
loop of the link bouncing up and down.

This is how I've implemented mac_config() in both mvneta and the out
of tree mvpp2x driver - in mvneta, I read the current register
settings, modify them according to the requested parameters, then
compare them, and only write the new register settings back as
required taking account of hardware restrictions (eg, port needs to
be temporarily disabled if certain settings are changed.)

If mac_config() is implemented as per that, then the above patch is
unnecessary.

I know that detail is not documented, it's something that I need to
update in the header file and my .rst documentation (also something
else that needs submitting.)

I also have a patch for phylink that allows us to use the GPIO
interrupt rather than polling with a 1s timer to check the state of
the fixed-link GPIO.
diff mbox series

Patch

--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -532,6 +532,17 @@  static int mv88e6xxx_port_setup_mac(struct mv88e6xxx_chip *chip, int port,
    if (!chip->info->ops->port_set_link)
        return 0;

+   if (chip->info->ops->port_link_state) {
+       struct phylink_link_state state;
+
+       err = chip->info->ops->port_link_state(chip, port, &state);
+
+       if (!err && link == state.link) {
+           dev_dbg(chip->dev, "link state already set\n");
+           return 0;
+       }
+   }
+
    /* Port's MAC control must not be changed unless the link is down */
    err = chip->info->ops->port_set_link(chip, port, 0);
    if (err)