diff mbox

regression in ixgbe SFP detection patch

Message ID 20151111173527.GA3641@gandi.net
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

William Dauchy Nov. 11, 2015, 5:35 p.m. UTC
Hello,

I upgraded a machine from 3.14.x to v4.1.x and noted that I now have two
kworker very often on D state, just after boot while I am not doing
anything special. This issue remains indefinitely.

This machine has four network interfaces:


01:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
        Subsystem: Inventec Corporation Device 004a
        Flags: bus master, fast devsel, latency 0, IRQ 17
        Memory at fbce0000 (32-bit, non-prefetchable) [size=128K]
        Memory at fbcc0000 (32-bit, non-prefetchable) [size=128K]
        I/O ports at cc00 [size=32]
        Memory at fbc9c000 (32-bit, non-prefetchable) [size=16K]
        Expansion ROM at fbca0000 [disabled] [size=128K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
        Capabilities: [70] MSI-X: Enable+ Count=10 Masked-
        Capabilities: [a0] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Device Serial Number 00-26-6c-ff-ff-ff-af-71
        Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [160] Single Root I/O Virtualization (SR-IOV)
        Kernel driver in use: igb

01:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
        Subsystem: Inventec Corporation Device 004a
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Memory at fbc20000 (32-bit, non-prefetchable) [size=128K]
        Memory at fbc00000 (32-bit, non-prefetchable) [size=128K]
        I/O ports at c880 [size=32]
        Memory at fbbdc000 (32-bit, non-prefetchable) [size=16K]
        Expansion ROM at fbbe0000 [disabled] [size=128K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
        Capabilities: [70] MSI-X: Enable+ Count=10 Masked-
        Capabilities: [a0] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Device Serial Number 00-26-6c-ff-ff-ff-af-71
        Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [160] Single Root I/O Virtualization (SR-IOV)
        Kernel driver in use: igb

03:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
        Subsystem: Inventec Corporation Device 004c
        Flags: bus master, fast devsel, latency 0, IRQ 56
        Memory at fbdc0000 (64-bit, non-prefetchable) [size=256K]
        I/O ports at dc00 [size=32]
        Memory at fbd9c000 (64-bit, non-prefetchable) [size=16K]
        Expansion ROM at fbda0000 [disabled] [size=128K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
        Capabilities: [70] MSI-X: Enable+ Count=64 Masked-
        Capabilities: [a0] Express Endpoint, MSI 00
        Capabilities: [e0] Vital Product Data
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Device Serial Number 00-8c-fa-ff-ff-01-cf-c2
        Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [160] Single Root I/O Virtualization (SR-IOV)
        Kernel driver in use: ixgbe

03:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
        Subsystem: Inventec Corporation Device 004c
        Flags: bus master, fast devsel, latency 0, IRQ 82
        Memory at fbd40000 (64-bit, non-prefetchable) [size=256K]
        I/O ports at d880 [size=32]
        Memory at fbd1c000 (64-bit, non-prefetchable) [size=16K]
        Expansion ROM at fbd20000 [disabled] [size=128K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
        Capabilities: [70] MSI-X: Enable+ Count=64 Masked-
        Capabilities: [a0] Express Endpoint, MSI 00
        Capabilities: [e0] Vital Product Data
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Device Serial Number 00-8c-fa-ff-ff-01-cf-c2
        Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [160] Single Root I/O Virtualization (SR-IOV)
        Kernel driver in use: ixgbe


The two ixgbe interfaces are not used (UP but no-carrier):

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group defa
    link/ether 00:26:6c:ff:af:70 brd ff:ff:ff:ff:ff:ff
    inet 10.5.5.58/24 brd 10.5.5.255 scope global eth0
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group defa
    link/ether 00:26:6c:ff:af:71 brd ff:ff:ff:ff:ff:ff
4: eth2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group 
    link/ether 00:8c:fa:01:cf:c2 brd ff:ff:ff:ff:ff:ff
5: eth3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group 
    link/ether 00:8c:fa:01:cf:c3 brd ff:ff:ff:ff:ff:ff


if I turn them down (ip link set dev eth{2,3} down); the problem
disappear, the two kworker in D disapper as well.

Since I consider this as a regression because I only change the kernel
version, I did a bisection in order to localize the issue.

What I got at the end is: (bisected between v3.14.x and v4.1.x)
# first bad commit: [d9cd46cd391a132a43cbde7bdac12c16284b618f] ixgbe: fix detection of SFP+ capable interfaces

After some tests, I reverted the only part present in ixgbe_main:



It also fixes my issue: even if eth{2,3} are still up with no carrier, I
don't have any kworker in D state.


So, is it something we should consider as a regression, in that case I
can send a formal patch, or do you need some more information to help
you debug it?


Thanks,

Comments

Tantilov, Emil S Nov. 11, 2015, 8:33 p.m. UTC | #1
>-----Original Message-----
>From: William Dauchy [mailto:william@gandi.net]
>Sent: Wednesday, November 11, 2015 9:35 AM
>To: Kirsher, Jeffrey T; Tantilov, Emil S
>Cc: davem@davemloft.net; netdev@vger.kernel.org; Schmitt, Phillip J; intel-
>wired-lan@lists.osuosl.org
>Subject: regression in ixgbe SFP detection patch
>
>Hello,
>
>I upgraded a machine from 3.14.x to v4.1.x and noted that I now have two
>kworker very often on D state, just after boot while I am not doing
>anything special. This issue remains indefinitely.
>
>This machine has four network interfaces:
>
>
>01:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network
>Connection (rev 01)
>        Subsystem: Inventec Corporation Device 004a
>        Flags: bus master, fast devsel, latency 0, IRQ 17
>        Memory at fbce0000 (32-bit, non-prefetchable) [size=128K]
>        Memory at fbcc0000 (32-bit, non-prefetchable) [size=128K]
>        I/O ports at cc00 [size=32]
>        Memory at fbc9c000 (32-bit, non-prefetchable) [size=16K]
>        Expansion ROM at fbca0000 [disabled] [size=128K]
>        Capabilities: [40] Power Management version 3
>        Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
>        Capabilities: [70] MSI-X: Enable+ Count=10 Masked-
>        Capabilities: [a0] Express Endpoint, MSI 00
>        Capabilities: [100] Advanced Error Reporting
>        Capabilities: [140] Device Serial Number 00-26-6c-ff-ff-ff-af-71
>        Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
>        Capabilities: [160] Single Root I/O Virtualization (SR-IOV)
>        Kernel driver in use: igb
>
>01:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network
>Connection (rev 01)
>        Subsystem: Inventec Corporation Device 004a
>        Flags: bus master, fast devsel, latency 0, IRQ 16
>        Memory at fbc20000 (32-bit, non-prefetchable) [size=128K]
>        Memory at fbc00000 (32-bit, non-prefetchable) [size=128K]
>        I/O ports at c880 [size=32]
>        Memory at fbbdc000 (32-bit, non-prefetchable) [size=16K]
>        Expansion ROM at fbbe0000 [disabled] [size=128K]
>        Capabilities: [40] Power Management version 3
>        Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
>        Capabilities: [70] MSI-X: Enable+ Count=10 Masked-
>        Capabilities: [a0] Express Endpoint, MSI 00
>        Capabilities: [100] Advanced Error Reporting
>        Capabilities: [140] Device Serial Number 00-26-6c-ff-ff-ff-af-71
>        Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
>        Capabilities: [160] Single Root I/O Virtualization (SR-IOV)
>        Kernel driver in use: igb
>
>03:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+
>Network Connection (rev 01)
>        Subsystem: Inventec Corporation Device 004c
>        Flags: bus master, fast devsel, latency 0, IRQ 56
>        Memory at fbdc0000 (64-bit, non-prefetchable) [size=256K]
>        I/O ports at dc00 [size=32]
>        Memory at fbd9c000 (64-bit, non-prefetchable) [size=16K]
>        Expansion ROM at fbda0000 [disabled] [size=128K]
>        Capabilities: [40] Power Management version 3
>        Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
>        Capabilities: [70] MSI-X: Enable+ Count=64 Masked-
>        Capabilities: [a0] Express Endpoint, MSI 00
>        Capabilities: [e0] Vital Product Data
>        Capabilities: [100] Advanced Error Reporting
>        Capabilities: [140] Device Serial Number 00-8c-fa-ff-ff-01-cf-c2
>        Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
>        Capabilities: [160] Single Root I/O Virtualization (SR-IOV)
>        Kernel driver in use: ixgbe
>
>03:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+
>Network Connection (rev 01)
>        Subsystem: Inventec Corporation Device 004c
>        Flags: bus master, fast devsel, latency 0, IRQ 82
>        Memory at fbd40000 (64-bit, non-prefetchable) [size=256K]
>        I/O ports at d880 [size=32]
>        Memory at fbd1c000 (64-bit, non-prefetchable) [size=16K]
>        Expansion ROM at fbd20000 [disabled] [size=128K]
>        Capabilities: [40] Power Management version 3
>        Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
>        Capabilities: [70] MSI-X: Enable+ Count=64 Masked-
>        Capabilities: [a0] Express Endpoint, MSI 00
>        Capabilities: [e0] Vital Product Data
>        Capabilities: [100] Advanced Error Reporting
>        Capabilities: [140] Device Serial Number 00-8c-fa-ff-ff-01-cf-c2
>        Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
>        Capabilities: [160] Single Root I/O Virtualization (SR-IOV)
>        Kernel driver in use: ixgbe
>
>
>The two ixgbe interfaces are not used (UP but no-carrier):
>
>2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group
>defa
>    link/ether 00:26:6c:ff:af:70 brd ff:ff:ff:ff:ff:ff
>    inet 10.5.5.58/24 brd 10.5.5.255 scope global eth0
>       valid_lft forever preferred_lft forever
>3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group
>defa
>    link/ether 00:26:6c:ff:af:71 brd ff:ff:ff:ff:ff:ff
>4: eth2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN
>group
>    link/ether 00:8c:fa:01:cf:c2 brd ff:ff:ff:ff:ff:ff
>5: eth3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN
>group
>    link/ether 00:8c:fa:01:cf:c3 brd ff:ff:ff:ff:ff:ff
>
>
>if I turn them down (ip link set dev eth{2,3} down); the problem
>disappear, the two kworker in D disappear as well.
The kworkers are needed to detect the SFP+ modules when they are plugged in.

>Since I consider this as a regression because I only change the kernel
>version, I did a bisection in order to localize the issue.

It's a change and the below commit explains in good detail why it is needed.

>What I got at the end is: (bisected between v3.14.x and v4.1.x)
># first bad commit: [d9cd46cd391a132a43cbde7bdac12c16284b618f] ixgbe: fix
>detection of SFP+ capable interfaces
>
>After some tests, I reverted the only part present in ixgbe_main:
>
>--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>@@ -4786,8 +4786,6 @@
> 	case ixgbe_phy_qsfp_active_unknown:
> 	case ixgbe_phy_qsfp_intel:
> 	case ixgbe_phy_qsfp_unknown:
>-	/* ixgbe_phy_none is set when no SFP module is present */
>-	case ixgbe_phy_none:
> 		return true;
> 	case ixgbe_phy_nl:
> 		if (hw->mac.type == ixgbe_mac_82598EB)
>
>
>It also fixes my issue: even if eth{2,3} are still up with no carrier, I
>don't have any kworker in D state.

It appears that you have 2 ports with empty cages. If that is the case there 
is no reason to keep the interfaces up. If you bring them down, or plug the SFP+
modules the kworkers should stop.

>
>
>So, is it something we should consider as a regression, in that case I
>can send a formal patch, or do you need some more information to help
>you debug it?

If the diff above is the patch you are referring to then you will break the
SFP+ detection in the case where the driver was loaded while there were no
SFP+ modules present in the cages.

Thanks,
Emil

>
>Thanks,
>--
>William
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
William Dauchy Nov. 11, 2015, 9:34 p.m. UTC | #2
On Nov11 20:33, Tantilov, Emil S wrote:
> If the diff above is the patch you are referring to then you will break the
> SFP+ detection in the case where the driver was loaded while there were no
> SFP+ modules present in the cages.

understood, I was surprised of the modification of behavior.
Alexander H Duyck Nov. 11, 2015, 10:09 p.m. UTC | #3
On 11/11/2015 01:34 PM, William Dauchy wrote:
> On Nov11 20:33, Tantilov, Emil S wrote:
>> If the diff above is the patch you are referring to then you will break the
>> SFP+ detection in the case where the driver was loaded while there were no
>> SFP+ modules present in the cages.
> understood, I was surprised of the modification of behavior.

You might try testing against net-next to see if the problem still 
exists.  It looks like the code in question doesn't exist upstream as it 
was replaced in commit 45788d2af928 ("ixgbe: fix issue with sfp events 
with new X550 devices").

- Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rustad, Mark D Nov. 11, 2015, 10:13 p.m. UTC | #4
William,

Emil S <emil.s.tantilov@intel.com> wrote:

>> It also fixes my issue: even if eth{2,3} are still up with no carrier, I
>> don't have any kworker in D state.
> 
> It appears that you have 2 ports with empty cages. If that is the case there
> is no reason to keep the interfaces up. If you bring them down, or plug the SFP+
> modules the kworkers should stop.
> 
>> So, is it something we should consider as a regression, in that case I
>> can send a formal patch, or do you need some more information to help
>> you debug it?
> 
> If the diff above is the patch you are referring to then you will break the
> SFP+ detection in the case where the driver was loaded while there were no
> SFP+ modules present in the cages.

Just so you know, there are patches in queue that will improve this situation in two ways:

1) When the I2C probe times out, the code assumes that the cage is empty and does not retry the access until the next probe.

2) The driver will use its own private workqueue, so it will not affect the system workqueues at all.

--
Mark Rustad, Networking Division, Intel Corporation
William Dauchy Nov. 12, 2015, 12:22 p.m. UTC | #5
On Nov11 22:13, Rustad, Mark D wrote:
> Just so you know, there are patches in queue that will improve this situation in two ways:
> 1) When the I2C probe times out, the code assumes that the cage is empty and does not retry the access until the next probe.
> 2) The driver will use its own private workqueue, so it will not affect the system workqueues at all.

Thanks guys for the details,  I will have a look.
diff mbox

Patch

--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -4786,8 +4786,6 @@ 
 	case ixgbe_phy_qsfp_active_unknown:
 	case ixgbe_phy_qsfp_intel:
 	case ixgbe_phy_qsfp_unknown:
-	/* ixgbe_phy_none is set when no SFP module is present */
-	case ixgbe_phy_none:
 		return true;
 	case ixgbe_phy_nl:
 		if (hw->mac.type == ixgbe_mac_82598EB)