diff mbox

e1000e: MSI interrupt test failed, using legacy interrupt

Message ID 1335485150-2765-1-git-send-email-prasanna.panchamukhi@riverbed.com
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

prasanna.panchamukhi@riverbed.com April 27, 2012, 12:05 a.m. UTC
From: Prasanna S. Panchamukhi <ppanchamukhi@riverbed.com>

Following logs where seen on Systems with multiple NICs & ports,
while using MSI interrupts as shown below:

Feb 16 15:09:32 (none) user.notice kernel: 0000:00:0d.0: lan0_0: NIC Link is Up
1000 Mbps Full Duplex, Flow Control: RX/TX
Feb 16 15:09:32 (none) user.notice kernel: 0000:40:0d.0: wan0_1: NIC Link is Up
1000 Mbps Full Duplex, Flow Control: RX/TX
Feb 16 15:09:32 (none) user.notice kernel: 0000:40:0d.0: lan0_1: NIC Link is Up
1000 Mbps Full Duplex, Flow Control: RX/TX
Feb 16 15:09:32 (none) user.warn kernel: 0000:40:0e.0: wan4_0: MSI interrupt
test failed, using legacy interrupt.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Feb 16 15:09:32 (none) user.notice kernel: 0000:00:0e.0: wan1_0: NIC Link is Up
1000 Mbps Full Duplex, Flow Control: RX/TX
Feb 16 15:09:33 (none) user.notice kernel: 0000:00:0e.0: lan1_0: NIC Link is Up
1000 Mbps Full Duplex, Flow Control: RX/TX
Feb 16 15:09:33 (none) user.notice kernel: 0000:00:0f.0: wan2_0: NIC Link is Up
1000 Mbps Full Duplex, Flow Control: RX/TX
Feb 16 15:09:33 (none) user.notice kernel: 0000:00:0f.0: lan2_0: NIC Link is Up
1000 Mbps Full Duplex, Flow Control: RX/TX
Feb 16 15:09:33 (none) user.notice kernel: 0000:40:0a.0: wan3_0: NIC Link is Up
1000 Mbps Full Duplex, Flow Control: RX/TX
Feb 16 15:09:33 (none) user.notice kernel: 0000:40:0a.0: lan3_0: NIC Link is Up
1000 Mbps Full Duplex, Flow Control: RX/TX
Feb 16 15:09:34 (none) user.notice kernel: 0000:40:0e.0: lan4_0: NIC Link is Up
1000 Mbps Full Duplex, Flow Control: RX/TX
Feb 16 15:09:34 (none) user.notice kernel: 0000:40:0f.0: wan5_0: NIC Link is Up
1000 Mbps Full Duplex, Flow Control: RX/TX
Feb 16 15:09:34 (none) user.notice kernel: 0000:40:0f.0: lan5_0: NIC Link is Up
1000 Mbps Full Duplex, Flow Control: RX/TX

This patch changes the IRQ tests to use polling loops starting with a
delay of 1 tick and doubling that if necessary up to a maximum total
delay of approximately 1 second.

Signed-off-by: Prasanna S. Panchamukhi <ppanchamukhi@riverbed.com>
---
 drivers/net/ethernet/intel/e1000e/netdev.c |   27 +++++++++++++++++++++++----
 1 files changed, 23 insertions(+), 4 deletions(-)

Comments

Jeff Kirsher April 27, 2012, 12:20 a.m. UTC | #1
On 04/26/2012 05:05 PM, prasanna.panchamukhi@riverbed.com wrote:
> From: Prasanna S. Panchamukhi <ppanchamukhi@riverbed.com>
>
> Following logs where seen on Systems with multiple NICs & ports,
> while using MSI interrupts as shown below:
>
> Feb 16 15:09:32 (none) user.notice kernel: 0000:00:0d.0: lan0_0: NIC Link is Up
> 1000 Mbps Full Duplex, Flow Control: RX/TX
> Feb 16 15:09:32 (none) user.notice kernel: 0000:40:0d.0: wan0_1: NIC Link is Up
> 1000 Mbps Full Duplex, Flow Control: RX/TX
> Feb 16 15:09:32 (none) user.notice kernel: 0000:40:0d.0: lan0_1: NIC Link is Up
> 1000 Mbps Full Duplex, Flow Control: RX/TX
> Feb 16 15:09:32 (none) user.warn kernel: 0000:40:0e.0: wan4_0: MSI interrupt
> test failed, using legacy interrupt.
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> Feb 16 15:09:32 (none) user.notice kernel: 0000:00:0e.0: wan1_0: NIC Link is Up
> 1000 Mbps Full Duplex, Flow Control: RX/TX
> Feb 16 15:09:33 (none) user.notice kernel: 0000:00:0e.0: lan1_0: NIC Link is Up
> 1000 Mbps Full Duplex, Flow Control: RX/TX
> Feb 16 15:09:33 (none) user.notice kernel: 0000:00:0f.0: wan2_0: NIC Link is Up
> 1000 Mbps Full Duplex, Flow Control: RX/TX
> Feb 16 15:09:33 (none) user.notice kernel: 0000:00:0f.0: lan2_0: NIC Link is Up
> 1000 Mbps Full Duplex, Flow Control: RX/TX
> Feb 16 15:09:33 (none) user.notice kernel: 0000:40:0a.0: wan3_0: NIC Link is Up
> 1000 Mbps Full Duplex, Flow Control: RX/TX
> Feb 16 15:09:33 (none) user.notice kernel: 0000:40:0a.0: lan3_0: NIC Link is Up
> 1000 Mbps Full Duplex, Flow Control: RX/TX
> Feb 16 15:09:34 (none) user.notice kernel: 0000:40:0e.0: lan4_0: NIC Link is Up
> 1000 Mbps Full Duplex, Flow Control: RX/TX
> Feb 16 15:09:34 (none) user.notice kernel: 0000:40:0f.0: wan5_0: NIC Link is Up
> 1000 Mbps Full Duplex, Flow Control: RX/TX
> Feb 16 15:09:34 (none) user.notice kernel: 0000:40:0f.0: lan5_0: NIC Link is Up
> 1000 Mbps Full Duplex, Flow Control: RX/TX
>
> This patch changes the IRQ tests to use polling loops starting with a
> delay of 1 tick and doubling that if necessary up to a maximum total
> delay of approximately 1 second.
>
> Signed-off-by: Prasanna S. Panchamukhi <ppanchamukhi@riverbed.com>
> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c |   27 +++++++++++++++++++++++----
>  1 files changed, 23 insertions(+), 4 deletions(-)
Thanks Prasanna, Dave did accept the previous patch, so does this patch
take that into account?
Prasanna Panchamukhi April 27, 2012, 12:42 a.m. UTC | #2
On 04/26/2012 05:20 PM, Jeff Kirsher wrote:
> On 04/26/2012 05:05 PM, prasanna.panchamukhi@riverbed.com wrote:
>> From: Prasanna S. Panchamukhi<ppanchamukhi@riverbed.com>
>>
>> Following logs where seen on Systems with multiple NICs&  ports,
>> while using MSI interrupts as shown below:
>>
>> Feb 16 15:09:32 (none) user.notice kernel: 0000:00:0d.0: lan0_0: NIC Link is Up
>> 1000 Mbps Full Duplex, Flow Control: RX/TX
>> Feb 16 15:09:32 (none) user.notice kernel: 0000:40:0d.0: wan0_1: NIC Link is Up
>> 1000 Mbps Full Duplex, Flow Control: RX/TX
>> Feb 16 15:09:32 (none) user.notice kernel: 0000:40:0d.0: lan0_1: NIC Link is Up
>> 1000 Mbps Full Duplex, Flow Control: RX/TX
>> Feb 16 15:09:32 (none) user.warn kernel: 0000:40:0e.0: wan4_0: MSI interrupt
>> test failed, using legacy interrupt.
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> Feb 16 15:09:32 (none) user.notice kernel: 0000:00:0e.0: wan1_0: NIC Link is Up
>> 1000 Mbps Full Duplex, Flow Control: RX/TX
>> Feb 16 15:09:33 (none) user.notice kernel: 0000:00:0e.0: lan1_0: NIC Link is Up
>> 1000 Mbps Full Duplex, Flow Control: RX/TX
>> Feb 16 15:09:33 (none) user.notice kernel: 0000:00:0f.0: wan2_0: NIC Link is Up
>> 1000 Mbps Full Duplex, Flow Control: RX/TX
>> Feb 16 15:09:33 (none) user.notice kernel: 0000:00:0f.0: lan2_0: NIC Link is Up
>> 1000 Mbps Full Duplex, Flow Control: RX/TX
>> Feb 16 15:09:33 (none) user.notice kernel: 0000:40:0a.0: wan3_0: NIC Link is Up
>> 1000 Mbps Full Duplex, Flow Control: RX/TX
>> Feb 16 15:09:33 (none) user.notice kernel: 0000:40:0a.0: lan3_0: NIC Link is Up
>> 1000 Mbps Full Duplex, Flow Control: RX/TX
>> Feb 16 15:09:34 (none) user.notice kernel: 0000:40:0e.0: lan4_0: NIC Link is Up
>> 1000 Mbps Full Duplex, Flow Control: RX/TX
>> Feb 16 15:09:34 (none) user.notice kernel: 0000:40:0f.0: wan5_0: NIC Link is Up
>> 1000 Mbps Full Duplex, Flow Control: RX/TX
>> Feb 16 15:09:34 (none) user.notice kernel: 0000:40:0f.0: lan5_0: NIC Link is Up
>> 1000 Mbps Full Duplex, Flow Control: RX/TX
>>
>> This patch changes the IRQ tests to use polling loops starting with a
>> delay of 1 tick and doubling that if necessary up to a maximum total
>> delay of approximately 1 second.
>>
>> Signed-off-by: Prasanna S. Panchamukhi<ppanchamukhi@riverbed.com>
>> ---
>>   drivers/net/ethernet/intel/e1000e/netdev.c |   27 +++++++++++++++++++++++----
>>   1 files changed, 23 insertions(+), 4 deletions(-)
> Thanks Prasanna, Dave did accept the previous patch, so does this patch
> take that into account?
No, If Dave has already accepted previous patch we can ignore this patch 
for now.

Thanks,
Prasanna
>
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kirsher, Jeffrey T April 27, 2012, 12:46 a.m. UTC | #3
On Thu, 2012-04-26 at 17:42 -0700, Prasanna Panchamukhi wrote:
> On 04/26/2012 05:20 PM, Jeff Kirsher wrote:
> > On 04/26/2012 05:05 PM, prasanna.panchamukhi@riverbed.com wrote:
> >> From: Prasanna S. Panchamukhi<ppanchamukhi@riverbed.com>
> >>
> >> Following logs where seen on Systems with multiple NICs&  ports,
> >> while using MSI interrupts as shown below:
> >>
> >> Feb 16 15:09:32 (none) user.notice kernel: 0000:00:0d.0: lan0_0: NIC Link is Up
> >> 1000 Mbps Full Duplex, Flow Control: RX/TX
> >> Feb 16 15:09:32 (none) user.notice kernel: 0000:40:0d.0: wan0_1: NIC Link is Up
> >> 1000 Mbps Full Duplex, Flow Control: RX/TX
> >> Feb 16 15:09:32 (none) user.notice kernel: 0000:40:0d.0: lan0_1: NIC Link is Up
> >> 1000 Mbps Full Duplex, Flow Control: RX/TX
> >> Feb 16 15:09:32 (none) user.warn kernel: 0000:40:0e.0: wan4_0: MSI interrupt
> >> test failed, using legacy interrupt.
> >> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >> Feb 16 15:09:32 (none) user.notice kernel: 0000:00:0e.0: wan1_0: NIC Link is Up
> >> 1000 Mbps Full Duplex, Flow Control: RX/TX
> >> Feb 16 15:09:33 (none) user.notice kernel: 0000:00:0e.0: lan1_0: NIC Link is Up
> >> 1000 Mbps Full Duplex, Flow Control: RX/TX
> >> Feb 16 15:09:33 (none) user.notice kernel: 0000:00:0f.0: wan2_0: NIC Link is Up
> >> 1000 Mbps Full Duplex, Flow Control: RX/TX
> >> Feb 16 15:09:33 (none) user.notice kernel: 0000:00:0f.0: lan2_0: NIC Link is Up
> >> 1000 Mbps Full Duplex, Flow Control: RX/TX
> >> Feb 16 15:09:33 (none) user.notice kernel: 0000:40:0a.0: wan3_0: NIC Link is Up
> >> 1000 Mbps Full Duplex, Flow Control: RX/TX
> >> Feb 16 15:09:33 (none) user.notice kernel: 0000:40:0a.0: lan3_0: NIC Link is Up
> >> 1000 Mbps Full Duplex, Flow Control: RX/TX
> >> Feb 16 15:09:34 (none) user.notice kernel: 0000:40:0e.0: lan4_0: NIC Link is Up
> >> 1000 Mbps Full Duplex, Flow Control: RX/TX
> >> Feb 16 15:09:34 (none) user.notice kernel: 0000:40:0f.0: wan5_0: NIC Link is Up
> >> 1000 Mbps Full Duplex, Flow Control: RX/TX
> >> Feb 16 15:09:34 (none) user.notice kernel: 0000:40:0f.0: lan5_0: NIC Link is Up
> >> 1000 Mbps Full Duplex, Flow Control: RX/TX
> >>
> >> This patch changes the IRQ tests to use polling loops starting with a
> >> delay of 1 tick and doubling that if necessary up to a maximum total
> >> delay of approximately 1 second.
> >>
> >> Signed-off-by: Prasanna S. Panchamukhi<ppanchamukhi@riverbed.com>
> >> ---
> >>   drivers/net/ethernet/intel/e1000e/netdev.c |   27 +++++++++++++++++++++++----
> >>   1 files changed, 23 insertions(+), 4 deletions(-)
> > Thanks Prasanna, Dave did accept the previous patch, so does this patch
> > take that into account?
> No, If Dave has already accepted previous patch we can ignore this patch 
> for now.
> 
> Thanks,
> Prasanna

Yeah, he has accepted the previous patch so I will ignore this patch.
Ben Hutchings April 27, 2012, 1:40 a.m. UTC | #4
On Thu, 2012-04-26 at 17:05 -0700, prasanna.panchamukhi@riverbed.com
wrote:
> From: Prasanna S. Panchamukhi <ppanchamukhi@riverbed.com>
> 
> Following logs where seen on Systems with multiple NICs & ports,
> while using MSI interrupts as shown below:
[...]
> This patch changes the IRQ tests to use polling loops starting with a
> delay of 1 tick and doubling that if necessary up to a maximum total
> delay of approximately 1 second.
[...]

I don't think you entirely understood my point.  This exponential
back-off was the best I could come up with for a self-test
(ethtool -t) where we have to answer the question 'are interrupts
working' in a reasonable time, and the user can decide for themselves
whether this was a false negative.

If you want a definite answer in order to make an automatic decision
about interrupt modes, there really is no valid time limit.  I think the
proper way to do interrupt mode selection in an in-tree driver is to
rely on no-MSI quirks for chipsets and devices in the PCI core.

Ben.
Prasanna Panchamukhi April 27, 2012, 7:52 p.m. UTC | #5
On 04/26/2012 06:40 PM, Ben Hutchings wrote:
> On Thu, 2012-04-26 at 17:05 -0700, prasanna.panchamukhi@riverbed.com
> wrote:
>> From: Prasanna S. Panchamukhi<ppanchamukhi@riverbed.com>
>>
>> Following logs where seen on Systems with multiple NICs&  ports,
>> while using MSI interrupts as shown below:
> [...]
>> This patch changes the IRQ tests to use polling loops starting with a
>> delay of 1 tick and doubling that if necessary up to a maximum total
>> delay of approximately 1 second.
> [...]
>
> I don't think you entirely understood my point.  This exponential
> back-off was the best I could come up with for a self-test
> (ethtool -t) where we have to answer the question 'are interrupts
> working' in a reasonable time, and the user can decide for themselves
> whether this was a false negative.
>
> If you want a definite answer in order to make an automatic decision
> about interrupt modes, there really is no valid time limit.  I think the
This issue is not able interrupt mode selection.

-Prasanna
> proper way to do interrupt mode selection in an in-tree driver is to
> rely on no-MSI quirks for chipsets and devices in the PCI core.
>
> Ben.
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index 19ab215..8ff0fff 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -98,6 +98,16 @@  struct e1000_reg_info {
 #define E1000_TDFTS	0x03428	/* Tx Data FIFO Tail Saved - RW */
 #define E1000_TDFPC	0x03430	/* Tx Data FIFO Packet Count - RW */
 
+/* IRQ latency can be enormous because:
+ * - All IRQs may be disabled on a CPU for a *long* time by e.g. a
+ *   slow serial console or an old IDE driver doing error recovery
+ * - The PREEMPT_RT patches mostly deal with this, but also allow a
+ *   tasklet or normal task to be given higher priority than our IRQ
+ *   threads
+ * Try to avoid blaming the hardware for this.
+ */
+#define IRQ_TIMEOUT HZ
+
 static const struct e1000_reg_info e1000_reg_info_tbl[] = {
 
 	/* General Registers */
@@ -3767,6 +3777,7 @@  static int e1000_test_msi_interrupt(struct e1000_adapter *adapter)
 {
 	struct net_device *netdev = adapter->netdev;
 	struct e1000_hw *hw = &adapter->hw;
+	unsigned long timeout, wait;
 	int err;
 
 	/* poll_enable hasn't been called yet, so don't need disable */
@@ -3799,12 +3810,20 @@  static int e1000_test_msi_interrupt(struct e1000_adapter *adapter)
 	/* fire an unusual interrupt on the test handler */
 	ew32(ICS, E1000_ICS_RXSEQ);
 	e1e_flush();
-	msleep(50);
-
+	timeout = jiffies + IRQ_TIMEOUT;
+	wait = 1;
+	e_dbg("Waiting for MSI interrupt!\n");
+	do {
+		schedule_timeout_uninterruptible(wait);
+		rmb();
+		if (!(adapter->flags & FLAG_MSI_TEST_FAILED))
+			goto success;
+		wait *= 2;
+	} while (time_before(jiffies, timeout));
+
+success:
 	e1000_irq_disable(adapter);
 
-	rmb();
-
 	if (adapter->flags & FLAG_MSI_TEST_FAILED) {
 		adapter->int_mode = E1000E_INT_MODE_LEGACY;
 		e_info("MSI interrupt test failed, using legacy interrupt.\n");