diff mbox series

net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs

Message ID 49eec816-9238-c893-0860-602aa8965515@bell.net
State RFC
Delegated to: David Miller
Headers show
Series net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs | expand

Commit Message

John David Anglin Jan. 22, 2019, 7:16 p.m. UTC
I've been hacking on a espressobin board to try to improve ptp support,
etc.  However, I have
a big problem with link detection on the wan, lan0 and lan1 ports.

I have a standard bridge configuration using systemd-networkd. 
Currently, I'm working with linux
v4.20.2.

From power on, none of the wan, lan0, lan1 or br0 achieve link
(LOWER_UP).  networkctl shows
no carrier for these ports.  Disconnecting and reconnecting cables is
not detected and makes no
difference to link state.  I  added a debug printout in
mv88e6352_port_link_state, but the routine
is not called.  As far as I can tell, link state changes are not
detected using PHY interrupts.  And yet,
if the card is rebooted, link detection seems to magically work.

I know that the 88E6341 port registers detect port link (also RJ45 LED)
correctly.

The attached patch fixes link detection at power on.  However, link
state still doesn't update if a cable
is disconnected or moved.

I'm puzzled as to how this is supposed to work.  Thoughts?

Regards,
Dave Anglin

Comments

Andrew Lunn Jan. 22, 2019, 8:28 p.m. UTC | #1
On Tue, Jan 22, 2019 at 02:16:09PM -0500, John David Anglin wrote:
> I've been hacking on a espressobin board to try to improve ptp support,
> etc.  However, I have
> a big problem with link detection on the wan, lan0 and lan1 ports.
> 
> I have a standard bridge configuration using systemd-networkd. 

Hi John

Does it make a difference if you do it by hand? Bring up the master
interface, wan, lan0, lan1, add any bridge you need, etc.

> From power on, none of the wan, lan0, lan1 or br0 achieve link
> (LOWER_UP). 

Is the master interface up? What does ip link show give?

   Andrew
John David Anglin Jan. 22, 2019, 9:40 p.m. UTC | #2
Hi Andrew,

On 1/22/2019 3:28 PM, Andrew Lunn wrote:
> Does it make a difference if you do it by hand? Bring up the master
> interface, wan, lan0, lan1, add any bridge you need, etc.
>
>>  From power on, none of the wan, lan0, lan1 or br0 achieve link
>> (LOWER_UP).
I can explore this but I don't know at the moment.  I believe that all 
interfaces are configured.

In serdes.c, interrupts are used to monitor link changes.  However, 
phy.c doesn't do this and
it doesn't call phylink_mac_change().
> Is the master interface up? What does ip link show give?
The master interface (eth0 on mvneta) always comes up.  The link status 
of this interface
appears to be checked by polling at 1 second intervals.

After a power on boot, this is what I see:

root@espressobin:~# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode 
DEFAULT group default qlen 1000
     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN 
mode DEFAULT group default qlen 1000
     link/ether 96:80:ba:01:e9:31 brd ff:ff:ff:ff:ff:ff
3: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT 
group default qlen 1000
     link/ether 46:42:f8:2b:d1:4e brd ff:ff:ff:ff:ff:ff
4: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP 
mode DEFAULT group default qlen 1024
     link/ether 3e:c0:66:0a:db:ba brd ff:ff:ff:ff:ff:ff
5: wan@eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue 
master br0 state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
     link/ether 3e:c0:66:0a:db:ba brd ff:ff:ff:ff:ff:ff
6: lan0@eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue 
master br0 state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
     link/ether 3e:c0:66:0a:db:ba brd ff:ff:ff:ff:ff:ff
7: lan1@eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue 
master br0 state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
     link/ether 3e:c0:66:0a:db:ba brd ff:ff:ff:ff:ff:ff
8: br0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state 
DOWN mode DEFAULT group default qlen 1000
     link/ether 00:50:43:25:fb:84 brd ff:ff:ff:ff:ff:ff
root@espressobin:~# networkctl
IDX LINK             TYPE               OPERATIONAL SETUP
   1 lo               loopback           carrier     unmanaged
   2 bond0            bond               off         unmanaged
   3 dummy0           ether              off         unmanaged
   4 eth0             ether              degraded    configured
   5 wan              dsa                no-carrier  configuring
   6 lan0             dsa                no-carrier  configuring
   7 lan1             dsa                no-carrier  configuring
   8 br0              bridge             no-carrier  configuring

8 links listed.

r0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
         ether 00:50:43:25:fb:84  txqueuelen 1000  (Ethernet)
         RX packets 0  bytes 0 (0.0 B)
         RX errors 0  dropped 0  overruns 0  frame 0
         TX packets 0  bytes 0 (0.0 B)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
         inet6 fe80::3cc0:66ff:fe0a:dbba  prefixlen 64  scopeid 0x20<link>
         ether 3e:c0:66:0a:db:ba  txqueuelen 1024  (Ethernet)
         RX packets 2229  bytes 160191 (156.4 KiB)
         RX errors 0  dropped 0  overruns 0  frame 0
         TX packets 14  bytes 1116 (1.0 KiB)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
         device interrupt 11

lan0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
         ether 3e:c0:66:0a:db:ba  txqueuelen 1000  (Ethernet)
         RX packets 1075  bytes 55969 (54.6 KiB)
         RX errors 0  dropped 551  overruns 0  frame 0
         TX packets 0  bytes 0 (0.0 B)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lan1: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
         ether 3e:c0:66:0a:db:ba  txqueuelen 1000  (Ethernet)
         RX packets 0  bytes 0 (0.0 B)
         RX errors 0  dropped 0  overruns 0  frame 0
         TX packets 0  bytes 0 (0.0 B)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
         inet 127.0.0.1  netmask 255.0.0.0
         inet6 ::1  prefixlen 128  scopeid 0x10<host>
         loop  txqueuelen 1000  (Local Loopback)
         RX packets 0  bytes 0 (0.0 B)
         RX errors 0  dropped 0  overruns 0  frame 0
         TX packets 0  bytes 0 (0.0 B)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

wan: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
         ether 3e:c0:66:0a:db:ba  txqueuelen 1000  (Ethernet)
         RX packets 1154  bytes 55184 (53.8 KiB)
         RX errors 0  dropped 14  overruns 0  frame 0
         TX packets 0  bytes 0 (0.0 B)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Then, if I do a "shutdown -r +0", I see:

root@espressobin:~# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode 
DEFAULT group default qlen 1000
     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN 
mode DEFAULT group default qlen 1000
     link/ether 26:dd:a4:7c:26:3f brd ff:ff:ff:ff:ff:ff
3: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT 
group default qlen 1000
     link/ether 0a:73:61:5a:ae:dc brd ff:ff:ff:ff:ff:ff
4: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP 
mode DEFAULT group default qlen 1024
     link/ether f6:2d:a2:ab:32:4d brd ff:ff:ff:ff:ff:ff
5: wan@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue 
master br0 state UP mode DEFAULT group default qlen 1000
     link/ether f6:2d:a2:ab:32:4d brd ff:ff:ff:ff:ff:ff
6: lan0@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue 
master br0 state UP mode DEFAULT group default qlen 1000
     link/ether f6:2d:a2:ab:32:4d brd ff:ff:ff:ff:ff:ff
7: lan1@eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue 
master br0 state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
     link/ether f6:2d:a2:ab:32:4d brd ff:ff:ff:ff:ff:ff
8: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state 
UP mode DEFAULT group default qlen 1000
     link/ether 00:50:43:25:fb:84 brd ff:ff:ff:ff:ff:ff
root@espressobin:~# networkctl
IDX LINK             TYPE               OPERATIONAL SETUP
   1 lo               loopback           carrier     unmanaged
   2 bond0            bond               off         unmanaged
   3 dummy0           ether              off         unmanaged
   4 eth0             ether              degraded    configured
   5 wan              dsa                degraded    configured
   6 lan0             dsa                degraded    configured
   7 lan1             dsa                no-carrier  configuring
   8 br0              bridge             routable    configured

8 links listed.
root@espressobin:~# ifconfig
br0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
         inet 192.168.0.55  netmask 255.255.255.0  broadcast 192.168.0.255
         inet6 fe80::250:43ff:fe25:fb84  prefixlen 64  scopeid 0x20<link>
         ether 00:50:43:25:fb:84  txqueuelen 1000  (Ethernet)
         RX packets 1184  bytes 237936 (232.3 KiB)
         RX errors 0  dropped 261  overruns 0  frame 0
         TX packets 193  bytes 19444 (18.9 KiB)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
         inet6 fe80::f42d:a2ff:feab:324d  prefixlen 64  scopeid 0x20<link>
         ether f6:2d:a2:ab:32:4d  txqueuelen 1024  (Ethernet)
         RX packets 1331  bytes 276209 (269.7 KiB)
         RX errors 0  dropped 0  overruns 0  frame 0
         TX packets 233  bytes 24352 (23.7 KiB)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
         device interrupt 11

lan0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
         inet6 fe80::f42d:a2ff:feab:324d  prefixlen 64  scopeid 0x20<link>
         ether f6:2d:a2:ab:32:4d  txqueuelen 1000  (Ethernet)
         RX packets 264  bytes 13773 (13.4 KiB)
         RX errors 0  dropped 135  overruns 0  frame 0
         TX packets 18  bytes 1368 (1.3 KiB)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lan1: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
         ether f6:2d:a2:ab:32:4d  txqueuelen 1000  (Ethernet)
         RX packets 0  bytes 0 (0.0 B)
         RX errors 0  dropped 0  overruns 0  frame 0
         TX packets 0  bytes 0 (0.0 B)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
         inet 127.0.0.1  netmask 255.0.0.0
         inet6 ::1  prefixlen 128  scopeid 0x10<host>
         loop  txqueuelen 1000  (Local Loopback)
         RX packets 0  bytes 0 (0.0 B)
         RX errors 0  dropped 0  overruns 0  frame 0
         TX packets 0  bytes 0 (0.0 B)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

wan: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
         inet6 fe80::f42d:a2ff:feab:324d  prefixlen 64  scopeid 0x20<link>
         ether f6:2d:a2:ab:32:4d  txqueuelen 1000  (Ethernet)
         RX packets 1067  bytes 233154 (227.6 KiB)
         RX errors 0  dropped 4  overruns 0  frame 0
         TX packets 203  bytes 20240 (19.7 KiB)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Everything seems normal.  The wan and lan0 ports are connected. From 
syslog for p2 (lan0):

Jan 22 16:02:34 localhost kernel: [    5.249693] mv88e6085 
d0032004.mdio-mii:01: p2: PortState set to Disabled
Jan 22 16:02:34 localhost kernel: [    5.394717] mv88e6085 
d0032004.mdio-mii:01: p2: Force link down
Jan 22 16:02:34 localhost kernel: [    5.400554] mv88e6085 
d0032004.mdio-mii:01: p2: Speed unforced
Jan 22 16:02:34 localhost kernel: [    5.406558] mv88e6085 
d0032004.mdio-mii:01: p2: Unforce half duplex
Jan 22 16:02:34 localhost kernel: [    5.413007] mv88e6085 
d0032004.mdio-mii:01: p2: Unforce link down
Jan 22 16:02:34 localhost kernel: [    5.421850] mv88e6085 
d0032004.mdio-mii:01: p2: 802.1QMode set to Disabled
Jan 22 16:02:34 localhost kernel: [    5.430197] mv88e6085 
d0032004.mdio-mii:01: p2: FID set to 0
Jan 22 16:02:34 localhost kernel: [    5.433696] mv88e6085 
d0032004.mdio-mii:01: p2: VLANTable set to 001
Jan 22 16:02:34 localhost kernel: [    6.136188] mv88e6085 
d0032004.mdio-mii:01 lan0 (uninitialized): PHY 
[!soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:12] driver 
[Marvell 88E6390]
Jan 22 16:02:34 localhost kernel: [    8.890947] bridge: filtering via 
arp/ip/ip6tables is no longer available by default. Update your scripts 
to load br_netfilter if you need this.
Jan 22 16:02:34 localhost systemd-networkd[358]: lan0: Joined netdev
Jan 22 16:02:34 localhost systemd-networkd[358]: lan0: Bringing link up
Jan 22 16:02:34 localhost systemd-networkd[358]: lan0: Flags change: +UP
Jan 22 16:02:34 localhost systemd-networkd[358]: lan0: Set link
Jan 22 16:02:34 localhost kernel: [    9.017442] br0: port 2(lan0) 
entered blocking state
Jan 22 16:02:34 localhost kernel: [    9.017478] br0: port 2(lan0) 
entered disabled state
Jan 22 16:02:34 localhost kernel: [    9.018314] mv88e6085 
d0032004.mdio-mii:01: p2: VLANTable set to 009
Jan 22 16:02:34 localhost kernel: [    9.021257] device lan0 entered 
promiscuous mode
Jan 22 16:02:34 localhost kernel: [    9.033658] mv88e6085 
d0032004.mdio-mii:01: p2: VLANTable set to 00b
an 22 16:02:34 localhost kernel: [    9.138112] mv88e6085 
d0032004.mdio-mii:01: p2: PortState set to Blocking/Listening
Jan 22 16:02:34 localhost kernel: [    9.138149] mv88e6085 
d0032004.mdio-mii:01 lan0: configuring for phy/ link mode
Jan 22 16:02:34 localhost kernel: [    9.138158] mv88e6085 
d0032004.mdio-mii:01: port=2, mode=0, link=0
Jan 22 16:02:34 localhost kernel: [    9.138267] mv88e6085 
d0032004.mdio-mii:01: port=2, mode=0, link=0
Jan 22 16:02:34 localhost kernel: [    9.138274] mv88e6085 
d0032004.mdio-mii:01 lan0: Link is Down - mac_link_dropped=0
Jan 22 16:02:34 localhost kernel: [    9.138885] IPv6: 
ADDRCONF(NETDEV_UP): lan0: link is not ready
Jan 22 16:02:34 localhost kernel: [    9.213888] mv88e6085 
d0032004.mdio-mii:01: port=2, mode=0, link=1
Jan 22 16:02:34 localhost kernel: [    9.213919] mv88e6085 
d0032004.mdio-mii:01 lan0: Link is Up - 1Gbps/Full - flow control rx/tx
Jan 22 16:02:34 localhost kernel: [    9.213962] IPv6: 
ADDRCONF(NETDEV_CHANGE): lan0: link becomes ready
Jan 22 16:02:34 localhost kernel: [   11.356713] br0: port 2(lan0) 
entered blocking state
Jan 22 16:02:34 localhost kernel: [   11.356719] br0: port 2(lan0) 
entered forwarding state
Jan 22 16:02:34 localhost kernel: [   11.359228] mv88e6085 
d0032004.mdio-mii:01: p2: PortState set to Blocking/Listening
Jan 22 16:02:34 localhost kernel: [   11.360339] mv88e6085 
d0032004.mdio-mii:01: p2: PortState set to Forwarding
Jan 22 16:02:34 localhost systemd-networkd[358]: lan0: Flags change: 
+LOWER_UP +RUNNING
Jan 22 16:02:34 localhost systemd-networkd[358]: lan0: Gained carrier
Jan 22 16:02:34 localhost systemd-networkd[358]: lan0: Configured
Andrew Lunn Jan. 22, 2019, 10:36 p.m. UTC | #3
On Tue, Jan 22, 2019 at 04:40:27PM -0500, John David Anglin wrote:
> Hi Andrew,
> 
> On 1/22/2019 3:28 PM, Andrew Lunn wrote:
> >Does it make a difference if you do it by hand? Bring up the master
> >interface, wan, lan0, lan1, add any bridge you need, etc.
> >
> >> From power on, none of the wan, lan0, lan1 or br0 achieve link
> >>(LOWER_UP).
> I can explore this but I don't know at the moment.  I believe that all
> interfaces are configured.

Hi John

In what order? The master interface has to be up first before any
slave interface is configured up.

> In serdes.c, interrupts are used to monitor link changes.  However, phy.c
> doesn't do this and
> it doesn't call phylink_mac_change().

It does not need to. There are two options here:

1) The PHY has no interrupt. phylib will poll the PHY once per second
   for link changes.

2) The PHY has in interrupt. Link changes will cause the interrupt to
   fire, and the phylib will then read the current state.

For PHYs embedded within a switch driver by mv88e6xxx interrupts
should always be used.

I will do some testing on my espressobin.

  Andrew
Andrew Lunn Jan. 22, 2019, 11:12 p.m. UTC | #4
On Tue, Jan 22, 2019 at 02:16:09PM -0500, John David Anglin wrote:
> I've been hacking on a espressobin board to try to improve ptp support,
> etc.  However, I have
> a big problem with link detection on the wan, lan0 and lan1 ports.

Hi John

I just booted my espressobin with net-next. It is running Debian, and
i have the following in /etc/network/interfaces

source /etc/network/interfaces.d/*

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
allow-hotplug wan
iface wan inet dhcp
  pre-up ip link set eth0 up

allow-hotplug lan0
iface lan0 inet static
  pre-up ip link set eth0 up
  address 10.42.42.42
  netmask 255.255.255.0

my wan port got its IP address from DHCP.

root@espressobin:~# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group defaul
t qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1508 qdisc mq state UP group defa
ult qlen 1024
    link/ether f0:ad:4e:03:69:9c brd ff:ff:ff:ff:ff:ff
    inet6 fe80::f2ad:4eff:fe03:699c/64 scope link 
       valid_lft forever preferred_lft forever
3: wan@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP g
roup default qlen 1000
    link/ether f0:ad:4e:03:69:9c brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.11/24 brd 10.0.0.255 scope global wan
       valid_lft forever preferred_lft forever
    inet6 fe80::f2ad:4eff:fe03:699c/64 scope link 
       valid_lft forever preferred_lft forever
4: lan0@eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state L
OWERLAYERDOWN group default qlen 1000
    link/ether f0:ad:4e:03:69:9c brd ff:ff:ff:ff:ff:ff
    inet 10.42.42.42/24 brd 10.42.42.255 scope global lan0
       valid_lft forever preferred_lft forever
5: lan1@eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default
 qlen 1000
    link/ether f0:ad:4e:03:69:9c brd ff:ff:ff:ff:ff:ff

lan0 is correctly down, because the machine on the other end is down.

I then manually configured lan1 up and powered on the peer. I then
see:

[  543.113227] IPv6: ADDRCONF(NETDEV_UP): lan1: link is not ready
[  546.680276] mv88e6085 d0032004.mdio-mii:01 lan1: Link is Up - 1Gbps/Full - fl
ow control off
[  546.686106] IPv6: ADDRCONF(NETDEV_CHANGE): lan1: link becomes ready

So for me, everything is working as it should.

I would suggest you manually configure your networking, just to
test. I would suspect systemd is not doing things correctly.

	Andrew
John David Anglin Jan. 22, 2019, 11:48 p.m. UTC | #5
On 2019-01-22 6:12 p.m., Andrew Lunn wrote:
> I would suggest you manually configure your networking, just to
> test. I would suspect systemd is not doing things correctly.
I have armbian with debian stretch installed.  I did find that both
systemd-networkd and
networkmanager were enabled.  I disabled the networkmanager service. 
However, DHCP
on br0 didn't always get an IP.  It was after disabling networkmanager
that the ports didn't
come up on boot.

Dave
John David Anglin Jan. 22, 2019, 11:52 p.m. UTC | #6
On 2019-01-22 5:36 p.m., Andrew Lunn wrote:
> In what order? The master interface has to be up first before any
> slave interface is configured up.
Possibly, that's a clue.  If I recall, that's determined by the naming
on the .network files in
/etc/systemd/network.  The slave interfaces are coming up in reverse
order: lan1->lan0->wan.
Need to check eth0 relative to the others.

Dave
John David Anglin Jan. 23, 2019, midnight UTC | #7
On 2019-01-22 6:12 p.m., Andrew Lunn wrote:
> I just booted my espressobin with net-next. It is running Debian, and
> i have the following in /etc/network/interfaces
>
> source /etc/network/interfaces.d/*
>
> # The loopback network interface
> auto lo
> iface lo inet loopback
>
> # The primary network interface
> allow-hotplug wan
> iface wan inet dhcp
>   pre-up ip link set eth0 up
>
> allow-hotplug lan0
> iface lan0 inet static
>   pre-up ip link set eth0 up
>   address 10.42.42.42
>   netmask 255.255.255.0
>
> my wan port got its IP address from DHCP.
Your configuration is different.  I have the wan, lan0 and lan1 ports
configured as a bridge and they
don't get IP addresses.  The only port that gets an IP is br0.

Dave
Florian Fainelli Jan. 23, 2019, 12:04 a.m. UTC | #8
On 1/22/19 4:00 PM, John David Anglin wrote:
> On 2019-01-22 6:12 p.m., Andrew Lunn wrote:
>> I just booted my espressobin with net-next. It is running Debian, and
>> i have the following in /etc/network/interfaces
>>
>> source /etc/network/interfaces.d/*
>>
>> # The loopback network interface
>> auto lo
>> iface lo inet loopback
>>
>> # The primary network interface
>> allow-hotplug wan
>> iface wan inet dhcp
>>   pre-up ip link set eth0 up
>>
>> allow-hotplug lan0
>> iface lan0 inet static
>>   pre-up ip link set eth0 up
>>   address 10.42.42.42
>>   netmask 255.255.255.0
>>
>> my wan port got its IP address from DHCP.
> Your configuration is different.  I have the wan, lan0 and lan1 ports
> configured as a bridge and they
> don't get IP addresses.  The only port that gets an IP is br0.

If you bridge all devices together that's expected. If you leave the
network devices interfaces outside of the bridge, then they act as
individual/standalone network ports, and that is where a DHCP client
would be running.

The point Andrew as trying to make is that if you make sure you always
bring-up the DSA master network device, typically ethX first, and then
the DSA network interfaces that are per-port, then you should not have
any issues with link being UP/DOWN. Whether IP stack works is a
different thing.
John David Anglin Jan. 23, 2019, 12:11 a.m. UTC | #9
On 2019-01-22 5:36 p.m., Andrew Lunn wrote:
> It does not need to. There are two options here:
>
> 1) The PHY has no interrupt. phylib will poll the PHY once per second
>    for link changes.
>
> 2) The PHY has in interrupt. Link changes will cause the interrupt to
>    fire, and the phylib will then read the current state.
>
> For PHYs embedded within a switch driver by mv88e6xxx interrupts
> should always be used.
I don't think option 2) is implemented.  Didn't see any irq code in phy.c.

If I remember correctly, one needs to use clause 45 accesses to get at
the PHY registers in the 88E6341.

Can you point me to the phylib code that does the polling?

Thanks,
Dave
Andrew Lunn Jan. 23, 2019, 12:22 a.m. UTC | #10
> > It does not need to. There are two options here:
> >
> > 1) The PHY has no interrupt. phylib will poll the PHY once per second
> >    for link changes.
> >
> > 2) The PHY has in interrupt. Link changes will cause the interrupt to
> >    fire, and the phylib will then read the current state.
> >
> > For PHYs embedded within a switch driver by mv88e6xxx interrupts
> > should always be used.

Hi Dave

From my Espressobin

cat /proc/interrupts
...
 44:          0          0  mv88e6xxx-g1   3 Edge      mv88e6xxx-g1-atu-prob
 46:          0          0  mv88e6xxx-g1   5 Edge      mv88e6xxx-g1-vtu-prob
 48:         38         24  mv88e6xxx-g1   7 Edge      mv88e6xxx-g2
 51:          0          1  mv88e6xxx-g2   1 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:11
 52:          0          0  mv88e6xxx-g2   2 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:12
 53:         38         23  mv88e6xxx-g2   3 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:13

These are PHY interrupts.

> I don't think option 2) is implemented.  Didn't see any irq code in phy.c.

You would not. All the interrupt code is in the PHY core and the PHY
driver. drivers/net/dsa/mv88e6xxx/phy.c is just a bunch of helpers
which allow the mdio bus driver to access phy registers. The PHY
driver itself is drivers/net/phy/marvell.c, and the interrupt handling
is spread between that and drivers/net/phy/phy.c

> If I remember correctly, one needs to use clause 45 accesses to get at
> the PHY registers in the 88E6341.

Nope. The PHYs are c22 devices. The SERDES are probably C45, but those
are not being used here.

      Andrew
John David Anglin Jan. 25, 2019, 4:30 p.m. UTC | #11
On 2019-01-22 7:22 p.m., Andrew Lunn wrote:
> >From my Espressobin
>
> cat /proc/interrupts
> ...
>  44:          0          0  mv88e6xxx-g1   3 Edge      mv88e6xxx-g1-atu-prob
>  46:          0          0  mv88e6xxx-g1   5 Edge      mv88e6xxx-g1-vtu-prob
>  48:         38         24  mv88e6xxx-g1   7 Edge      mv88e6xxx-g2
>  51:          0          1  mv88e6xxx-g2   1 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:11
>  52:          0          0  mv88e6xxx-g2   2 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:12
>  53:         38         23  mv88e6xxx-g2   3 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:13
>
> These are PHY interrupts.
>
Thanks.  That was the clue.  I had tried to enable hardware for support
for switch interrupts.  However,
the espressobin connects the switch interrupt output to a southbridge
pin that only supports edge interrupts.
Link detection works okay once the attached change is removed.  I may
try and see if moving the interrupt
to a northbridge pin works.

I have to wonder about the use of edge interrupts.  Given that a GPIO
interrupt isn't defined for the mv88e6xxx,
then the interrupt must be done by polling the global1 control register.

This also makes me wonder about this SD interrupt:
43:          0          0     GPIO1   3 Edge      d00d0000.sdhci cd
Many poeple have trouble with SD cards on reboot.

Although the driver doesn't handle AVB interrupts, I started down this
path to try get ptp4l working better.  I
have tweaked the polling but there are still circumstances where
timestamps are overwritten (or not written).

Dave
Russell King (Oracle) Jan. 25, 2019, 4:48 p.m. UTC | #12
On Fri, Jan 25, 2019 at 11:30:54AM -0500, John David Anglin wrote:
> This also makes me wonder about this SD interrupt:
> 43:          0          0     GPIO1   3 Edge      d00d0000.sdhci cd
> Many poeple have trouble with SD cards on reboot.

[off topic, but a relevant reply to the above]

Very likely unrelated to edge interrupts.

For a signal that goes low when a card is inserted and high when
removed, you really need an edge interrupt to avoid flooding the CPU
with a "stuck" interrupt - if you were to use an active low interrupt
for card detection, and the signal is pulled low by card insertion,
the interrupt remains active until you remove the card!

In any case, I've seen the same with some early SolidRun platforms
when using high speed modes.

Most issues with SD cards at reboot is where the card is left in low
voltage/high speed mode, the reboot happens, and then you try to talk
to it in low-speed mode (as required by the specs for card detection/
initialisation) and they don't respond.  The only way out of that is
to reset the card - and the only way to reset the card is to remove
power from it.

If you're lucky enough to have a "regulator" which can be switched
off at reboot, that may not be sufficient - if there's a capacitor on
the SD card's supply line downstream from the regulator, the card can
be drawing soo little current that the voltage doesn't fall
sufficiently to cause the card to power down.

If you don't have any means to reset the card like that, then the only
workaround is to reduce the maximum speed - and avoid going to 1.8V
signalling mode.
John David Anglin Jan. 25, 2019, 6:38 p.m. UTC | #13
On 2019-01-25 11:48 a.m., Russell King - ARM Linux admin wrote:
> If you don't have any means to reset the card like that, then the only
> workaround is to reduce the maximum speed - and avoid going to 1.8V
> signalling mode.
There is no power controller on espressobin.  The regulator only
switches the I/O voltage.

The attached change fixed reboot on espressobin with problematic SD card.

Thanks,
Dave
John David Anglin Jan. 30, 2019, 5:08 p.m. UTC | #14
On 2019-01-22 7:22 p.m., Andrew Lunn wrote:
> >From my Espressobin
>
> cat /proc/interrupts
> ...
>  44:          0          0  mv88e6xxx-g1   3 Edge      mv88e6xxx-g1-atu-prob
>  46:          0          0  mv88e6xxx-g1   5 Edge      mv88e6xxx-g1-vtu-prob
>  48:         38         24  mv88e6xxx-g1   7 Edge      mv88e6xxx-g2
>  51:          0          1  mv88e6xxx-g2   1 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:11
>  52:          0          0  mv88e6xxx-g2   2 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:12
>  53:         38         23  mv88e6xxx-g2   3 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:13
>
> These are PHY interrupts.
If we come back to my trying to use the INTn pin on the esspressobin, I
have found that clearing and resetting the interrupt
enable bits in the global control register (offset 0x4) restarts link
detection when the device is stuck.  This suggests that the
INTn connection to MPP2_23 is low when the the GIC interrupt is enabled
on this pin.  Possibly, this is caused by the fact
that EEIntEn is set to 1 on reset.  INTn then goes low when EEPROM
loading is done.  Another possibility might be race conditions
in processing interrupts.

Thoughts?

Dave
Andrew Lunn Jan. 30, 2019, 5:28 p.m. UTC | #15
On Wed, Jan 30, 2019 at 12:08:39PM -0500, John David Anglin wrote:
> On 2019-01-22 7:22 p.m., Andrew Lunn wrote:
> > >From my Espressobin
> >
> > cat /proc/interrupts
> > ...
> >  44:          0          0  mv88e6xxx-g1   3 Edge      mv88e6xxx-g1-atu-prob
> >  46:          0          0  mv88e6xxx-g1   5 Edge      mv88e6xxx-g1-vtu-prob
> >  48:         38         24  mv88e6xxx-g1   7 Edge      mv88e6xxx-g2
> >  51:          0          1  mv88e6xxx-g2   1 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:11
> >  52:          0          0  mv88e6xxx-g2   2 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:12
> >  53:         38         23  mv88e6xxx-g2   3 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:13
> >
> > These are PHY interrupts.
> If we come back to my trying to use the INTn pin on the esspressobin, I
> have found that clearing and resetting the interrupt
> enable bits in the global control register (offset 0x4) restarts link
> detection when the device is stuck.  This suggests that the
> INTn connection to MPP2_23 is low when the the GIC interrupt is enabled
> on this pin.  Possibly, this is caused by the fact
> that EEIntEn is set to 1 on reset.  INTn then goes low when EEPROM
> loading is done.  Another possibility might be race conditions
> in processing interrupts.
> 
> Thoughts?

Hi David

You need active low interrupts. Without it, i think you are always
going to have race conditions which will cause interrupts to get
stuck/lost.

I would suggest you remove the interrupt from your device tree and use
the mv88e6xxx polling method. If i remember correctly, it currently
polls 10 per second, so PHY link up/down is going to be 5 times faster
on average than when phylib is polling the PHY.

   Andrew
John David Anglin Jan. 30, 2019, 7:01 p.m. UTC | #16
On 2019-01-30 12:28 p.m., Andrew Lunn wrote:
> On Wed, Jan 30, 2019 at 12:08:39PM -0500, John David Anglin wrote:
>> On 2019-01-22 7:22 p.m., Andrew Lunn wrote:
>>> >From my Espressobin
>>>
>>> cat /proc/interrupts
>>> ...
>>>  44:          0          0  mv88e6xxx-g1   3 Edge      mv88e6xxx-g1-atu-prob
>>>  46:          0          0  mv88e6xxx-g1   5 Edge      mv88e6xxx-g1-vtu-prob
>>>  48:         38         24  mv88e6xxx-g1   7 Edge      mv88e6xxx-g2
>>>  51:          0          1  mv88e6xxx-g2   1 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:11
>>>  52:          0          0  mv88e6xxx-g2   2 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:12
>>>  53:         38         23  mv88e6xxx-g2   3 Edge      !soc!internal-regs@d0000000!mdio@32004!switch0@1!mdio:13
>>>
>>> These are PHY interrupts.
>> If we come back to my trying to use the INTn pin on the esspressobin, I
>> have found that clearing and resetting the interrupt
>> enable bits in the global control register (offset 0x4) restarts link
>> detection when the device is stuck.  This suggests that the
>> INTn connection to MPP2_23 is low when the the GIC interrupt is enabled
>> on this pin.  Possibly, this is caused by the fact
>> that EEIntEn is set to 1 on reset.  INTn then goes low when EEPROM
>> loading is done.  Another possibility might be race conditions
>> in processing interrupts.
>>
>> Thoughts?
> Hi David
>
> You need active low interrupts. Without it, i think you are always
> going to have race conditions which will cause interrupts to get
> stuck/lost.
>
> I would suggest you remove the interrupt from your device tree and use
> the mv88e6xxx polling method. If i remember correctly, it currently
> polls 10 per second, so PHY link up/down is going to be 5 times faster
> on average than when phylib is polling the PHY.
Hi Andrew,

The main motivation in doing this is to try to enable the AVB interrupt
and to improve the PTP support.
I agree that polling is perfectly fine for PHY link interrupts. 
Possibly, ATU and VTU might need faster
support but I'm not using that at the moment.

I have hacked on the time stamp code quit a bit to try and improve
things but there are still issues with
lost or overwritten time stamps:

Jan 28 11:15:05 localhost kernel: [234850.840883] mv88e6085
d0032004.mdio-mii:01
: timestamp discarded
Jan 28 11:15:05 localhost ptp4l: [234852.998] port 3: received SYNC
without time
stamp

I think when PTP packets other than PDELAY are too closely spaced, we
have problems accessing
the timestamp quick enough.  Also, timestamp access is dependent on CPU
speed and HZ.

It looks like I can easily connect MPP2_23 to MPP1_16 on the edge
connector P8.  I believe the northbridge
pins support level interrupts.

In /proc/interrupts, the switch interrupts are shown as edge.  The only
place that I see where this
is potentially set is mv88e6xxx_g2_watchdog_setup() where the call to
request_threaded_irq() passes
"IRQF_ONESHOT | IRQF_TRIGGER_FALLING".  Does this need to change?

Dave
Andrew Lunn Jan. 30, 2019, 7:09 p.m. UTC | #17
> In /proc/interrupts, the switch interrupts are shown as edge.  The only
> place that I see where this
> is potentially set is mv88e6xxx_g2_watchdog_setup() where the call to
> request_threaded_irq() passes
> "IRQF_ONESHOT | IRQF_TRIGGER_FALLING".  Does this need to change?

Hi Dave

Device tree determines how the interrupt is configured.  It is the
second part of the interrupts property.

       Andrew
John David Anglin Jan. 30, 2019, 10:24 p.m. UTC | #18
On 2019-01-30 12:28 p.m., Andrew Lunn wrote:
> You need active low interrupts. Without it, i think you are always
> going to have race conditions which will cause interrupts to get
> stuck/lost.
I don't know if this is a hardware limitation or not, but currently the
armada 37xx doesn't support
level interrupts:

[    4.013280] genirq: Setting trigger mode 8 for irq 44 failed
(armada_37xx_irq_set_type+0x0/0x158)
[    4.014075] mv88e6085: probe of d0032004.mdio-mii:01 failed with
error -22

The function armada_37xx_irq_set_type() only supports edge interrupts.

On the plus side, DTC no longer objects to level interrupts on southbridge.

Dave
Andrew Lunn Jan. 30, 2019, 10:38 p.m. UTC | #19
On Wed, Jan 30, 2019 at 05:24:53PM -0500, John David Anglin wrote:
> On 2019-01-30 12:28 p.m., Andrew Lunn wrote:
> > You need active low interrupts. Without it, i think you are always
> > going to have race conditions which will cause interrupts to get
> > stuck/lost.
> I don't know if this is a hardware limitation or not, but currently the
> armada 37xx doesn't support
> level interrupts:

Yes, i had a quick look at the pinctrl driver. It only has code for
edges.

I'd suggest you take a look at the datasheet for the 37xx and check
what the hardware actually supports. You might need to extend the
driver.

	Andrew
John David Anglin Jan. 31, 2019, 1:27 a.m. UTC | #20
On 2019-01-30 5:38 p.m., Andrew Lunn wrote:
> I'd suggest you take a look at the datasheet for the 37xx and check
> what the hardware actually supports. You might need to extend the
> driver.
I did look and the GIC does support level interrupts.  But all the
documentation is in
generic ARM documents that I don't currently have.  I'll see if I can
find them tomorrow.

Dave
John David Anglin Jan. 31, 2019, 5:27 p.m. UTC | #21
On 2019-01-30 8:27 p.m., John David Anglin wrote:
> On 2019-01-30 5:38 p.m., Andrew Lunn wrote:
>> I'd suggest you take a look at the datasheet for the 37xx and check
>> what the hardware actually supports. You might need to extend the
>> driver.
> I did look and the GIC does support level interrupts.  But all the
> documentation is in
> generic ARM documents that I don't currently have.  I'll see if I can
> find them tomorrow.
On a closer look at MV-S110897-00C, I see that the north and south
bridge GPIO interrupt registers
only provide edge polarity control.  The GPIO pins don't appear to
support level interrupts on 88F37xx.

Dave
diff mbox series

Patch

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 9b8dd0d0ee42..c1ec13b320ee 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -405,6 +405,7 @@  static void phylink_resolve(struct work_struct *w)
 		case MLO_AN_PHY:
 			link_state = pl->phy_state;
 			phylink_resolve_flow(pl, &link_state);
+			phylink_get_mac_state(pl, &link_state);
 			phylink_mac_config(pl, &link_state);
 			break;