mbox series

[net-next,0/5] net: phy: improve and simplify phylib state machine

Message ID 922c223b-7bc0-e0ec-345d-2034b796af91@gmail.com
Headers show
Series net: phy: improve and simplify phylib state machine | expand

Message

Heiner Kallweit Nov. 7, 2018, 7:41 p.m. UTC
This patch series is based on two axioms:

- During autoneg a PHY always reports the link being down

- Info in clause 22/45 registers doesn't allow to differentiate between
  these two states:
  1. Link is physically down
  2. A link partner is connected and PHY is autonegotiating
  In both cases "link up" and "aneg finished" bits aren't set.
  One consequence is that having separate states PHY_NOLINK and PHY_AN
  isn't needed.

By using these two axioms the state machine can be significantly
simplified.

Heiner Kallweit (5):
  net: phy: remove useless check in state machine case PHY_NOLINK
  net: phy: remove useless check in state machine case PHY_RESUMING
  net: phy: add phy_check_link_status
  net: phy: remove state PHY_AN
  net: phy: use phy_check_link_status in more places in the state machine

 drivers/net/phy/phy.c | 172 +++++++++++-------------------------------
 include/linux/phy.h   |  19 +----
 2 files changed, 46 insertions(+), 145 deletions(-)

Comments

Andrew Lunn Nov. 7, 2018, 7:48 p.m. UTC | #1
On Wed, Nov 07, 2018 at 08:41:52PM +0100, Heiner Kallweit wrote:
> This patch series is based on two axioms:
> 
> - During autoneg a PHY always reports the link being down

Hi Heiner

I think that is a risky assumption to make.

What happens if this assumption is incorrect?

     Andrew
Heiner Kallweit Nov. 7, 2018, 8:05 p.m. UTC | #2
On 07.11.2018 20:48, Andrew Lunn wrote:
> On Wed, Nov 07, 2018 at 08:41:52PM +0100, Heiner Kallweit wrote:
>> This patch series is based on two axioms:
>>
>> - During autoneg a PHY always reports the link being down
> 
> Hi Heiner
> 
> I think that is a risky assumption to make.
> 
I wasn't sure initially too (found no clear rule in 802.3 clause 22)
and therefore asked around. Florian agrees to the assumption,
see here: https://www.spinics.net/lists/netdev/msg519242.html

If a PHY reports the link as up then every user would assume that
data can be transferred. But that's not the case during aneg.
Therefore reporting the link as up during aneg wouldn't make sense.

> What happens if this assumption is incorrect?
> 
Then we have to flush this patch series down the drain ;)
At least I would have to check in detail which parts need to be
changed. I clearly mention the assumptions so that every
reviewer can check whether he agrees.

>      Andrew
> 
Heiner
Andrew Lunn Nov. 7, 2018, 8:21 p.m. UTC | #3
On Wed, Nov 07, 2018 at 09:05:49PM +0100, Heiner Kallweit wrote:
> On 07.11.2018 20:48, Andrew Lunn wrote:
> > On Wed, Nov 07, 2018 at 08:41:52PM +0100, Heiner Kallweit wrote:
> >> This patch series is based on two axioms:
> >>
> >> - During autoneg a PHY always reports the link being down
> > 
> > Hi Heiner
> > 
> > I think that is a risky assumption to make.
> > 
> I wasn't sure initially too (found no clear rule in 802.3 clause 22)
> and therefore asked around. Florian agrees to the assumption,
> see here: https://www.spinics.net/lists/netdev/msg519242.html
> 
> If a PHY reports the link as up then every user would assume that
> data can be transferred. But that's not the case during aneg.
> Therefore reporting the link as up during aneg wouldn't make sense.

Hi Heiner

If auto-neg has already been completed once before, i can see a lazy
hardware designed not reporting link down, or at least, not until
auto-neg actually fails.

And what about if link is down for too short a time for us to notice?
I've seen some code fail because the kernel went off and did something
else for too long, and a state change was missed. 

> > What happens if this assumption is incorrect?
> > 
> Then we have to flush this patch series down the drain ;)
> At least I would have to check in detail which parts need to be
> changed. I clearly mention the assumptions so that every
> reviewer can check whether he agrees.

Thanks for doing that. I want to be happy this is safe, and not going
to introduce regressions.

   Andrew
Heiner Kallweit Nov. 7, 2018, 8:45 p.m. UTC | #4
On 07.11.2018 21:21, Andrew Lunn wrote:
> On Wed, Nov 07, 2018 at 09:05:49PM +0100, Heiner Kallweit wrote:
>> On 07.11.2018 20:48, Andrew Lunn wrote:
>>> On Wed, Nov 07, 2018 at 08:41:52PM +0100, Heiner Kallweit wrote:
>>>> This patch series is based on two axioms:
>>>>
>>>> - During autoneg a PHY always reports the link being down
>>>
>>> Hi Heiner
>>>
>>> I think that is a risky assumption to make.
>>>
>> I wasn't sure initially too (found no clear rule in 802.3 clause 22)
>> and therefore asked around. Florian agrees to the assumption,
>> see here: https://www.spinics.net/lists/netdev/msg519242.html
>>
>> If a PHY reports the link as up then every user would assume that
>> data can be transferred. But that's not the case during aneg.
>> Therefore reporting the link as up during aneg wouldn't make sense.
> 
> Hi Heiner
> 
> If auto-neg has already been completed once before, i can see a lazy
> hardware designed not reporting link down, or at least, not until
> auto-neg actually fails.
> 
"aneg finished" flag means that the aneg parameters in the register set
are valid. Once the link goes down that's not necessarily the case any
longer. E.g. some PHYs have an "auto speed down" feature and reduce
the speed to save power once they detect the link is down.
Of course I can not rule out that there are broken designs (or as you
stated more politely: lazy designs) out there. But in this case I assume
we would see issues already. And we would have to think about whether we
want to support such broken / lazy designs in phylib.

> And what about if link is down for too short a time for us to notice?
> I've seen some code fail because the kernel went off and did something
> else for too long, and a state change was missed. 
> 
This is a case we have already, independent of my change.
genphy_update_link() reads BMSR twice, thus ignoring potential latched
info about a temporary link failure. When polling phylib ignores
everything that happens between two poll intervals.

>>> What happens if this assumption is incorrect?
>>>
>> Then we have to flush this patch series down the drain ;)
>> At least I would have to check in detail which parts need to be
>> changed. I clearly mention the assumptions so that every
>> reviewer can check whether he agrees.
> 
> Thanks for doing that. I want to be happy this is safe, and not going
> to introduce regressions.
> 
>    Andrew
>
Heiner Kallweit Nov. 8, 2018, 7:20 a.m. UTC | #5
On 07.11.2018 21:45, Heiner Kallweit wrote:
> On 07.11.2018 21:21, Andrew Lunn wrote:
>> On Wed, Nov 07, 2018 at 09:05:49PM +0100, Heiner Kallweit wrote:
>>> On 07.11.2018 20:48, Andrew Lunn wrote:
>>>> On Wed, Nov 07, 2018 at 08:41:52PM +0100, Heiner Kallweit wrote:
>>>>> This patch series is based on two axioms:
>>>>>
>>>>> - During autoneg a PHY always reports the link being down
>>>>
>>>> Hi Heiner
>>>>
>>>> I think that is a risky assumption to make.
>>>>
>>> I wasn't sure initially too (found no clear rule in 802.3 clause 22)
>>> and therefore asked around. Florian agrees to the assumption,
>>> see here: https://www.spinics.net/lists/netdev/msg519242.html
>>>
>>> If a PHY reports the link as up then every user would assume that
>>> data can be transferred. But that's not the case during aneg.
>>> Therefore reporting the link as up during aneg wouldn't make sense.
>>
>> Hi Heiner
>>
>> If auto-neg has already been completed once before, i can see a lazy
>> hardware designed not reporting link down, or at least, not until
>> auto-neg actually fails.
>>
> "aneg finished" flag means that the aneg parameters in the register set
> are valid. Once the link goes down that's not necessarily the case any
> longer. E.g. some PHYs have an "auto speed down" feature and reduce
> the speed to save power once they detect the link is down.
> Of course I can not rule out that there are broken designs (or as you
> stated more politely: lazy designs) out there. But in this case I assume
> we would see issues already. And we would have to think about whether we
> want to support such broken / lazy designs in phylib.
> 
Had one more look at the changes to check what happens if "link up" and
"aneg done" are not in sync.

When link goes down the changes don't change current behavior. We just
check the "link up" bit.

When link goes up and aneg isn't finished yet, then we would report
"link is up" earlier to userspace than we do now. If userspace starts
some network activity based on the "link up" event then they may fail.
But it really would be a major flaw if a PHY signals "link up" whilst
it's not actually ready yet to transfer data.

In case of such a broken design we would have issues with the current
code already, at least if interrupts are used. The "link up" interrupt
would cause the state machine to go to PHY_CHANGELINK which doesn't
check for aneg status.

>> And what about if link is down for too short a time for us to notice?
>> I've seen some code fail because the kernel went off and did something
>> else for too long, and a state change was missed. 
>>
> This is a case we have already, independent of my change.
> genphy_update_link() reads BMSR twice, thus ignoring potential latched
> info about a temporary link failure. When polling phylib ignores
> everything that happens between two poll intervals.
> 
>>>> What happens if this assumption is incorrect?
>>>>
>>> Then we have to flush this patch series down the drain ;)
>>> At least I would have to check in detail which parts need to be
>>> changed. I clearly mention the assumptions so that every
>>> reviewer can check whether he agrees.
>>
>> Thanks for doing that. I want to be happy this is safe, and not going
>> to introduce regressions.
>>
>>    Andrew
>>
>
David Miller Nov. 8, 2018, 10:58 p.m. UTC | #6
From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Wed, 7 Nov 2018 20:41:52 +0100

> This patch series is based on two axioms:
> 
> - During autoneg a PHY always reports the link being down
> 
> - Info in clause 22/45 registers doesn't allow to differentiate between
>   these two states:
>   1. Link is physically down
>   2. A link partner is connected and PHY is autonegotiating
>   In both cases "link up" and "aneg finished" bits aren't set.
>   One consequence is that having separate states PHY_NOLINK and PHY_AN
>   isn't needed.
> 
> By using these two axioms the state machine can be significantly
> simplified.

So how are we going to move forward on this?

Maybe we can apply this series and just watch carefully for any
problems that get reported or are found?
Florian Fainelli Nov. 8, 2018, 11 p.m. UTC | #7
On 11/8/18 2:58 PM, David Miller wrote:
> From: Heiner Kallweit <hkallweit1@gmail.com>
> Date: Wed, 7 Nov 2018 20:41:52 +0100
> 
>> This patch series is based on two axioms:
>>
>> - During autoneg a PHY always reports the link being down
>>
>> - Info in clause 22/45 registers doesn't allow to differentiate between
>>   these two states:
>>   1. Link is physically down
>>   2. A link partner is connected and PHY is autonegotiating
>>   In both cases "link up" and "aneg finished" bits aren't set.
>>   One consequence is that having separate states PHY_NOLINK and PHY_AN
>>   isn't needed.
>>
>> By using these two axioms the state machine can be significantly
>> simplified.
> 
> So how are we going to move forward on this?
> 
> Maybe we can apply this series and just watch carefully for any
> problems that get reported or are found?

Given Heiner is always responsive and taking care of fixing what might
be/have broken, no objections with me on that.
Andrew Lunn Nov. 8, 2018, 11:01 p.m. UTC | #8
On Thu, Nov 08, 2018 at 03:00:01PM -0800, Florian Fainelli wrote:
> On 11/8/18 2:58 PM, David Miller wrote:
> > From: Heiner Kallweit <hkallweit1@gmail.com>
> > Date: Wed, 7 Nov 2018 20:41:52 +0100
> > 
> >> This patch series is based on two axioms:
> >>
> >> - During autoneg a PHY always reports the link being down
> >>
> >> - Info in clause 22/45 registers doesn't allow to differentiate between
> >>   these two states:
> >>   1. Link is physically down
> >>   2. A link partner is connected and PHY is autonegotiating
> >>   In both cases "link up" and "aneg finished" bits aren't set.
> >>   One consequence is that having separate states PHY_NOLINK and PHY_AN
> >>   isn't needed.
> >>
> >> By using these two axioms the state machine can be significantly
> >> simplified.
> > 
> > So how are we going to move forward on this?
> > 
> > Maybe we can apply this series and just watch carefully for any
> > problems that get reported or are found?
> 
> Given Heiner is always responsive and taking care of fixing what might
> be/have broken, no objections with me on that.

Yes, lets try it.

     Andrew
David Miller Nov. 8, 2018, 11:04 p.m. UTC | #9
From: Florian Fainelli <f.fainelli@gmail.com>
Date: Thu, 8 Nov 2018 15:00:01 -0800

> On 11/8/18 2:58 PM, David Miller wrote:
>> From: Heiner Kallweit <hkallweit1@gmail.com>
>> Date: Wed, 7 Nov 2018 20:41:52 +0100
>> 
>>> This patch series is based on two axioms:
>>>
>>> - During autoneg a PHY always reports the link being down
>>>
>>> - Info in clause 22/45 registers doesn't allow to differentiate between
>>>   these two states:
>>>   1. Link is physically down
>>>   2. A link partner is connected and PHY is autonegotiating
>>>   In both cases "link up" and "aneg finished" bits aren't set.
>>>   One consequence is that having separate states PHY_NOLINK and PHY_AN
>>>   isn't needed.
>>>
>>> By using these two axioms the state machine can be significantly
>>> simplified.
>> 
>> So how are we going to move forward on this?
>> 
>> Maybe we can apply this series and just watch carefully for any
>> problems that get reported or are found?
> 
> Given Heiner is always responsive and taking care of fixing what might
> be/have broken, no objections with me on that.

Great, I've applied this series to net-next then.

Thanks.