diff mbox

r8169: use RxFIFO overflow workaround for 8168c chipset

Message ID 1296127451-12640-1-git-send-email-ivecera@redhat.com
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Ivan Vecera Jan. 27, 2011, 11:24 a.m. UTC
I found that one of the 8168c chipsets (concretely XID 1c4000c0) starts
generating RxFIFO overflow errors. The result is an infinite loop in
interrupt handler as the RxFIFOOver is handled only for ...MAC_VER_11.
With the workaround everything goes fine.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
---
 drivers/net/r8169.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

Comments

Ivan Vecera Jan. 27, 2011, 11:27 a.m. UTC | #1
On Thu, 2011-01-27 at 12:24 +0100, Ivan Vecera wrote:
> I found that one of the 8168c chipsets (concretely XID 1c4000c0) starts
> generating RxFIFO overflow errors. The result is an infinite loop in
> interrupt handler as the RxFIFOOver is handled only for ...MAC_VER_11.
> With the workaround everything goes fine.
...of course starts generating RxFIFO overflow errors under very high
load. :-)

Ivan

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Francois Romieu Jan. 27, 2011, 2:32 p.m. UTC | #2
Ivan Vecera <ivecera@redhat.com> :
> I found that one of the 8168c chipsets (concretely XID 1c4000c0) starts
> generating RxFIFO overflow errors. The result is an infinite loop in
> interrupt handler as the RxFIFOOver is handled only for ...MAC_VER_11.

Acked-by: as your patch ties it to a specific 8168 revision (CFG_METHOD_6
in Realtek's parlance).

Surprizing as it may seem, unconditionaly enabling it has not always
produced the expected result. See 53f57357ff0afc37804f4e82ee3123e0c0a2cad6
for instance. Realtek's r1868 driver ignores it most of time as well.

Was it normal high-load or pktgen like high load ?
Ivan Vecera Jan. 27, 2011, 7:23 p.m. UTC | #3
On Thu, 2011-01-27 at 15:32 +0100, Francois Romieu wrote:
> Ivan Vecera <ivecera@redhat.com> :
> > I found that one of the 8168c chipsets (concretely XID 1c4000c0) starts
> > generating RxFIFO overflow errors. The result is an infinite loop in
> > interrupt handler as the RxFIFOOver is handled only for ...MAC_VER_11.
> 
> Acked-by: as your patch ties it to a specific 8168 revision (CFG_METHOD_6
> in Realtek's parlance).
> 
> Surprizing as it may seem, unconditionaly enabling it has not always
> produced the expected result. See 53f57357ff0afc37804f4e82ee3123e0c0a2cad6
> for instance. Realtek's r1868 driver ignores it most of time as well.
> 
> Was it normal high-load or pktgen like high load ?
The test case was: Migration of the several kvm guests at the same time
between two hosts.

Ivan

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Francois Romieu Jan. 28, 2011, 5:28 p.m. UTC | #4
Ivan Vecera <ivecera@redhat.com> :
[...]
> The test case was: Migration of the several kvm guests at the same time
> between two hosts.

:o(

Could you check if simply leaving the irq handler when status == RxFIFOOver
works by any chance ?

Faced with pktgen, RTL_GIGA_MAC_VER_26 spits RxFIFO overflow errors quite
fast (10000 packets kill it, reliably). The same fix as your avoids the crash.

As is, RTL_GIGA_MAC_VER_12 seems to survive at 900kpps. It signals RXFIFO
overflow, loses half the packets and sends pause control frames but it does
not crash. I have made it leave the irq handler when status == RxFIFOOver.
I'll see if it can stand a few hours like that.
Ivan Vecera Feb. 1, 2011, 10:30 a.m. UTC | #5
On Fri, 2011-01-28 at 18:28 +0100, Francois Romieu wrote:
> Ivan Vecera <ivecera@redhat.com> :
> [...]
> > The test case was: Migration of the several kvm guests at the same time
> > between two hosts.
> 
> :o(
> 
> Could you check if simply leaving the irq handler when status == RxFIFOOver
> works by any chance ?
It does not help, I tried two versions:
1) Simply leave the loop (only "break;") when status == RxFIFOOver
...
if (unlikely(status & RxFIFOOver)) {
	static int rxfifo_count = 0; 
        printk("r8169: Rx FIFO overflows detected: %d\n", 
        	++rxfifo_count);
	break;
}
...

2) the same but with interrupt confirmation before leave
...
if (unlikely(status & RxFIFOOver)) {
	static int rxfifo_count = 0; 
        printk("r8169: Rx FIFO overflows detected: %d\n", 
        	++rxfifo_count);
	RTL_W16(IntrStatus,
		(status & RxFIFOOver) ?
		 (status | RxOverflow) : status);
	break;
}
...

In both cases the NIC continues in generating Rx FIFO overflows.

Ivan

> 
> Faced with pktgen, RTL_GIGA_MAC_VER_26 spits RxFIFO overflow errors quite
> fast (10000 packets kill it, reliably). The same fix as your avoids the crash.
> 
> As is, RTL_GIGA_MAC_VER_12 seems to survive at 900kpps. It signals RXFIFO
> overflow, loses half the packets and sends pause control frames but it does
> not crash. I have made it leave the irq handler when status == RxFIFOOver.
> I'll see if it can stand a few hours like that.
> 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index bde7d61..9ab3b43 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -3757,7 +3757,8 @@  static void rtl_hw_start_8168(struct net_device *dev)
 	RTL_W16(IntrMitigate, 0x5151);
 
 	/* Work around for RxFIFO overflow. */
-	if (tp->mac_version == RTL_GIGA_MAC_VER_11) {
+	if (tp->mac_version == RTL_GIGA_MAC_VER_11 ||
+	    tp->mac_version == RTL_GIGA_MAC_VER_22) {
 		tp->intr_event |= RxFIFOOver | PCSTimeout;
 		tp->intr_event &= ~RxOverflow;
 	}
@@ -4641,7 +4642,8 @@  static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 
 		/* Work around for rx fifo overflow */
 		if (unlikely(status & RxFIFOOver) &&
-		(tp->mac_version == RTL_GIGA_MAC_VER_11)) {
+		    (tp->mac_version == RTL_GIGA_MAC_VER_11 ||
+		     tp->mac_version == RTL_GIGA_MAC_VER_22)) {
 			netif_stop_queue(dev);
 			rtl8169_tx_timeout(dev);
 			break;