Message ID | 1296127451-12640-1-git-send-email-ivecera@redhat.com |
---|---|
State | Superseded, archived |
Delegated to: | David Miller |
Headers | show |
On Thu, 2011-01-27 at 12:24 +0100, Ivan Vecera wrote: > I found that one of the 8168c chipsets (concretely XID 1c4000c0) starts > generating RxFIFO overflow errors. The result is an infinite loop in > interrupt handler as the RxFIFOOver is handled only for ...MAC_VER_11. > With the workaround everything goes fine. ...of course starts generating RxFIFO overflow errors under very high load. :-) Ivan -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ivan Vecera <ivecera@redhat.com> : > I found that one of the 8168c chipsets (concretely XID 1c4000c0) starts > generating RxFIFO overflow errors. The result is an infinite loop in > interrupt handler as the RxFIFOOver is handled only for ...MAC_VER_11. Acked-by: as your patch ties it to a specific 8168 revision (CFG_METHOD_6 in Realtek's parlance). Surprizing as it may seem, unconditionaly enabling it has not always produced the expected result. See 53f57357ff0afc37804f4e82ee3123e0c0a2cad6 for instance. Realtek's r1868 driver ignores it most of time as well. Was it normal high-load or pktgen like high load ?
On Thu, 2011-01-27 at 15:32 +0100, Francois Romieu wrote: > Ivan Vecera <ivecera@redhat.com> : > > I found that one of the 8168c chipsets (concretely XID 1c4000c0) starts > > generating RxFIFO overflow errors. The result is an infinite loop in > > interrupt handler as the RxFIFOOver is handled only for ...MAC_VER_11. > > Acked-by: as your patch ties it to a specific 8168 revision (CFG_METHOD_6 > in Realtek's parlance). > > Surprizing as it may seem, unconditionaly enabling it has not always > produced the expected result. See 53f57357ff0afc37804f4e82ee3123e0c0a2cad6 > for instance. Realtek's r1868 driver ignores it most of time as well. > > Was it normal high-load or pktgen like high load ? The test case was: Migration of the several kvm guests at the same time between two hosts. Ivan -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ivan Vecera <ivecera@redhat.com> : [...] > The test case was: Migration of the several kvm guests at the same time > between two hosts. :o( Could you check if simply leaving the irq handler when status == RxFIFOOver works by any chance ? Faced with pktgen, RTL_GIGA_MAC_VER_26 spits RxFIFO overflow errors quite fast (10000 packets kill it, reliably). The same fix as your avoids the crash. As is, RTL_GIGA_MAC_VER_12 seems to survive at 900kpps. It signals RXFIFO overflow, loses half the packets and sends pause control frames but it does not crash. I have made it leave the irq handler when status == RxFIFOOver. I'll see if it can stand a few hours like that.
On Fri, 2011-01-28 at 18:28 +0100, Francois Romieu wrote: > Ivan Vecera <ivecera@redhat.com> : > [...] > > The test case was: Migration of the several kvm guests at the same time > > between two hosts. > > :o( > > Could you check if simply leaving the irq handler when status == RxFIFOOver > works by any chance ? It does not help, I tried two versions: 1) Simply leave the loop (only "break;") when status == RxFIFOOver ... if (unlikely(status & RxFIFOOver)) { static int rxfifo_count = 0; printk("r8169: Rx FIFO overflows detected: %d\n", ++rxfifo_count); break; } ... 2) the same but with interrupt confirmation before leave ... if (unlikely(status & RxFIFOOver)) { static int rxfifo_count = 0; printk("r8169: Rx FIFO overflows detected: %d\n", ++rxfifo_count); RTL_W16(IntrStatus, (status & RxFIFOOver) ? (status | RxOverflow) : status); break; } ... In both cases the NIC continues in generating Rx FIFO overflows. Ivan > > Faced with pktgen, RTL_GIGA_MAC_VER_26 spits RxFIFO overflow errors quite > fast (10000 packets kill it, reliably). The same fix as your avoids the crash. > > As is, RTL_GIGA_MAC_VER_12 seems to survive at 900kpps. It signals RXFIFO > overflow, loses half the packets and sends pause control frames but it does > not crash. I have made it leave the irq handler when status == RxFIFOOver. > I'll see if it can stand a few hours like that. > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c index bde7d61..9ab3b43 100644 --- a/drivers/net/r8169.c +++ b/drivers/net/r8169.c @@ -3757,7 +3757,8 @@ static void rtl_hw_start_8168(struct net_device *dev) RTL_W16(IntrMitigate, 0x5151); /* Work around for RxFIFO overflow. */ - if (tp->mac_version == RTL_GIGA_MAC_VER_11) { + if (tp->mac_version == RTL_GIGA_MAC_VER_11 || + tp->mac_version == RTL_GIGA_MAC_VER_22) { tp->intr_event |= RxFIFOOver | PCSTimeout; tp->intr_event &= ~RxOverflow; } @@ -4641,7 +4642,8 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance) /* Work around for rx fifo overflow */ if (unlikely(status & RxFIFOOver) && - (tp->mac_version == RTL_GIGA_MAC_VER_11)) { + (tp->mac_version == RTL_GIGA_MAC_VER_11 || + tp->mac_version == RTL_GIGA_MAC_VER_22)) { netif_stop_queue(dev); rtl8169_tx_timeout(dev); break;
I found that one of the 8168c chipsets (concretely XID 1c4000c0) starts generating RxFIFO overflow errors. The result is an infinite loop in interrupt handler as the RxFIFOOver is handled only for ...MAC_VER_11. With the workaround everything goes fine. Signed-off-by: Ivan Vecera <ivecera@redhat.com> --- drivers/net/r8169.c | 6 ++++-- 1 files changed, 4 insertions(+), 2 deletions(-)