Message ID | CAKfDRXjwDqXBMJakmmfSP5+kCE-YimQacYYB8gj2y6br=j-+TQ@mail.gmail.com |
---|---|
State | RFC |
Headers | show |
Hi, On Sat, Aug 26, 2017 at 4:47 PM, Kristian Evensen <kristian.evensen@gmail.com> wrote: > I guess our common theory is that something is causing TX interrupts > not to be enabled again. After reading up on NAPI > (https://wiki.linuxfoundation.org/networking/napi), it seems that the > recommended way of using NAPI on clear-on-write devices (like the > RT5350) is to use NAPI for RX and do TX in the interrupt handler. In > the current driver, both TX and RX is handled in the NAPI-callback > fe_poll(). I have modified the driver to split RX and TX, so now > fe_poll() only deals with RX and TX is dealt with in fe_handle_irq(). > I have attached the (messy) patch I am currently testing. If this is > an OK solution, I will clean up the patch and submit is to the list. I > will leave my tests running overnight and report back if anything pops > up. I did not have to wait very long before anything. Unfortunately, my router crashed right after I sent the previous email. So I guess that means back to the drawing board. -Kristian
Hello again, On Sat, Aug 26, 2017 at 4:57 PM, Kristian Evensen <kristian.evensen@gmail.com> wrote: > I did not have to wait very long before anything. Unfortunately, my > router crashed right after I sent the previous email. So I guess that > means back to the drawing board. I have kept working on this issue throughout the weekend and have noticed something strange. I don't know if this observation is relevant or not, and I still haven't found out why the queue times out (why the queue is stopped), but perhaps the observation can help point us in the right direction. I noticed that TX_WB_DDONE is set, so I decided to dump the state of each txd when the timeout occurs. When I do this (and the timeout has been triggered), I see that dtx always point to txds with DDONE set, while ctx points to txds where DDONE is not set. To me, this seems to be the inverse of how it should be. I.e., dtx should point to a txd where DDONE is not set, while ctx to should point to a txd with DDONE set. Could it be that these counters/indexes for some reason goes out of sync and that this triggers the timeout? There is data to be sent, but it is not seen. To give a more complete example, say we have the following error: [ 1816.133130] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out [ 1816.139304] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000067 [ 1816.145288] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0ee44000, max=512, ctx=302, dtx=202, fdx=202, next=302 [ 1816.155968] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0ec0e000, max=512, calc=486, drx=484 Here, txd[202-301] have DDONE set, while DDONE is not set for txd[302-201]. Thanks in advance for all the help so far, Kristian
Hello, It turns out that the assumption that the "transmit timed out"-issue was related to pause frames/flow control was incorrect. I have recently started to see the error again, with flow control disabled. However, unlike last time, I am now able to reliably trigger the issue. The timeout seems to be triggered by connectivity problems between MT7621-based routers (not sure if it applies to other devices with the MT7530 switch) and the next hop. I checked each client connected to some of the routers exhibiting this issue, and turns out that some had bad cables, etc.. In order to check the theory in a more controlled fashion, I set up the following small testbed: NUC (192.168.1.1) <-> (192.168.1.2) ZBT 3526 (192.168.2.1) <-> (192.168.2.2) ZBT 2626 (192.168.3.1) <-> (192.168.3.2) Client I then configured port forwarding from the 3526 and all the way to the client and hammered the client with small UDP packets. Then, at random points, I intentionally hung the kernel on the 2626 by triggering an RCU error causing a stall. L2 was still up, but the 2626 does not reply to any packets, including ARP (so the neighbor-table entry for 192.168.2.2 is quickly lost). More or less as soon as the kernel hung, the transmit timeout-error message started showing up. If I restart networking or enable/disable the ports, then everything works fine for a bit (I can for example ping 192.168.1.1 from 192.168.1.2), but after some time the error appears again. I have been trying to solve this myself for a couple of days, but I am starting to run out of idea. Could it be that there is some traffic destined for the client (via. the 2626) that gets stuck in the TX queue on the 3526? Any help, pointers on where to look or ideas for what could be wrong would be much appreciated. Thanks in advance for the help, Kristian
Hello, On Wed, Nov 8, 2017 at 10:56 PM, Kristian Evensen <kristian.evensen@gmail.com> wrote: > I have been trying to solve this myself for a couple of days, but I am > starting to run out of idea. Could it be that there is some traffic > destined for the client (via. the 2626) that gets stuck in the TX > queue on the 3526? Any help, pointers on where to look or ideas for > what could be wrong would be much appreciated. I have been doing some more debugging during the day today, and it seems that the problem is that the hardware for some reason stops transmitting data. I dumped the content of the TX ring on every timeout, and the output seems to indicate that the state of the ring is sane. For example, with the following error: [ 325.650638] mtk_soc_eth 1e100000.ethernet eth0: transmit timed out [ 325.656857] mtk_soc_eth 1e100000.ethernet eth0: dma_cfg:80000067 [ 325.662902] mtk_soc_eth 1e100000.ethernet eth0: tx_ring=0, base=0ef58000, max=512, ctx=384, dtx=0, fdx=0, next=384 [ 325.673234] mtk_soc_eth 1e100000.ethernet eth0: rx_ring=0, base=0ef5e000, max=512, calc=5, drx=6 I see that the CPU txds [384-511] have DDONE set and no SKB, while DDONE is not set for the DMA txds [0-383] and an skb is attached. I also looked at the content of the skb, and as far as I can see it is valid. Looking at the content of the SKB also shows that fe_reset_pending() does its job. For every timeout, there is a new set of packets on the ring. So new packets are put on the ring, but none are sent. -Kristian
On Thu, Nov 9, 2017 at 12:06 PM, Kristian Evensen <kristian.evensen@gmail.com> wrote: > I see that the CPU txds [384-511] have DDONE set and no SKB, while > DDONE is not set for the DMA txds [0-383] and an skb is attached. I > also looked at the content of the skb, and as far as I can see it is > valid. Looking at the content of the SKB also shows that > fe_reset_pending() does its job. For every timeout, there is a new set > of packets on the ring. So new packets are put on the ring, but none > are sent. I have been hammering away on this issue during the day, and it seems that the DMA engine, TX, etc. works just fine. However, for some reason, the port with the router that has hung is able to stop the whole switch. If I disable the port (or disconnect the cable), then TX works again and I can for example reach 192.168.1.1 from 192.168.1.2 in my testbed. When running ping (from 192.168.1.2 to 192.168.1.1) while disconnecting the cable, the first packets had a very high RTT (~20ms). Running tcpdump showed that the reply arrived immediately, so it seems the packets are stuck in a TX buffer for a really long time. Could it be that there is a cache or something internally on the switch that is causing packets to be held back, and that this cache is invalidated and buffers flushed when I disable the port? I cleared switch, DIP and SIP tables without any effect. If I enable the port, then the problem appears again after a little while (~30 seconds). -Kristian
On Thu, Nov 9, 2017 at 5:21 PM, Kristian Evensen <kristian.evensen@gmail.com> wrote: > I have been hammering away on this issue during the day, and it seems > that the DMA engine, TX, etc. works just fine. However, for some > reason, the port with the router that has hung is able to stop the > whole switch. If I disable the port (or disconnect the cable), then TX > works again and I can for example reach 192.168.1.1 from 192.168.1.2 > in my testbed. When running ping (from 192.168.1.2 to 192.168.1.1) > while disconnecting the cable, the first packets had a very high RTT > (~20ms). Running tcpdump showed that the reply arrived immediately, so > it seems the packets are stuck in a TX buffer for a really long time. > Could it be that there is a cache or something internally on the > switch that is causing packets to be held back, and that this cache is > invalidated and buffers flushed when I disable the port? I cleared > switch, DIP and SIP tables without any effect. > > If I enable the port, then the problem appears again after a little > while (~30 seconds). I replaced the 3526 with other devices containing the mt7530 switch (both mt7621 and mt7623-based boards), and the issues seems to be related to the switch rather than the SoC. I am able to reliably trigger the timeout on all devices I have tested, both running proprietary drivers/firmware and LEDE. I guess this points to that there is some traffic pattern or network behavior that triggers an error in the MT7530 and causes TX to freeze. Restarting the ports makes the switch work again, but as long as the "bad" device is connected to the mt7530 then it is just a matter of time before the timeout is back. -Kristian
Hello, On Thu, Nov 9, 2017 at 8:42 PM, Kristian Evensen <kristian.evensen@gmail.com> wrote: > I replaced the 3526 with other devices containing the mt7530 switch > (both mt7621 and mt7623-based boards), and the issues seems to be > related to the switch rather than the SoC. I am able to reliably > trigger the timeout on all devices I have tested, both running > proprietary drivers/firmware and LEDE. I guess this points to that > there is some traffic pattern or network behavior that triggers an > error in the MT7530 and causes TX to freeze. Restarting the ports > makes the switch work again, but as long as the "bad" device is > connected to the mt7530 then it is just a matter of time before the > timeout is back. I think I am ready to conclude on this issue. First of all, I have discovered that I made an incorrect statement earlier. I have not seen the problem with flow control disabled. After finding a network tap and a device which passes for example pause frames to the driver so I can see them with tcpdump, I think I finally see what is going on. I connected the tap between router #1 and router #2, and performed the test described earlier with flow control enabled and disabled. When triggering the RCU stall, I see a continuous flood of pause frames coming from router #2. This flood happens irrespective of if flow control is enabled or not. However, with flow control enabled, I see that the RxPause- and TxPause-counters increase. With flow control disabled, they remain at 0. In other words, it seems that the switch filters out pause frames if the bit is unset in the feature register (it would be great if someone could confirm/deny this). The MT7530 switch seems to use one buffer for all ports, so what I have seen all along is head of line blocking. Since I use iperf in UDP mode, the traffic destined for router #2 never slows down and fills up the buffer. Thus, all other traffic is blocked. When disabling the port used by route #2, the buffer is cleared and packets can flow as normal again. With flow control disabled, I do not see the head of line blocking. If I am connected to router #1, I can always reach it. If flow control is enabled, router #1 stops replying to for example ping when the pause flood starts. I don't know what is the correct "solution" for this problem. I asked Piotr to mark my patch for always disabling flow control as not applicable, but perhaps it should be brought back if everyone agrees that disabling flow control is ok. If not, then perhaps the following patch should be accepted so that it is possible to switch flow control on/off: https://lists.openwrt.org/pipermail/openwrt-devel/2016-April/040705.html BR, Kristian
From 98ce0313cd8654fe69028c19b6f08da1d0671d75 Mon Sep 17 00:00:00 2001 From: Kristian Evensen <kristian.evensen@gmail.com> Date: Sat, 26 Aug 2017 16:05:44 +0200 Subject: [PATCH] [FIX] Move TX out of Napi This is a broken patch only meant for backup. --- .../drivers/net/ethernet/mtk/mtk_eth_soc.c | 36 +++++++++++++++++----- 1 file changed, 28 insertions(+), 8 deletions(-) diff --git a/target/linux/ramips/files-4.9/drivers/net/ethernet/mtk/mtk_eth_soc.c b/target/linux/ramips/files-4.9/drivers/net/ethernet/mtk/mtk_eth_soc.c index 5b354a6563..af865826e8 100644 --- a/target/linux/ramips/files-4.9/drivers/net/ethernet/mtk/mtk_eth_soc.c +++ b/target/linux/ramips/files-4.9/drivers/net/ethernet/mtk/mtk_eth_soc.c @@ -684,10 +684,13 @@ static int fe_tx_map_dma(struct sk_buff *skb, struct net_device *dev, */ wmb(); if (unlikely(fe_empty_txd(ring) <= ring->tx_thresh)) { + pr_info_ratelimited("netif_stop_queue %u %u\n", fe_empty_txd(ring), ring->tx_thresh); netif_stop_queue(dev); smp_mb(); - if (unlikely(fe_empty_txd(ring) > ring->tx_thresh)) + if (unlikely(fe_empty_txd(ring) > ring->tx_thresh)) { + pr_info_ratelimited("netif_wake_queue\n"); netif_wake_queue(dev); + } } if (netif_xmit_stopped(netdev_get_tx_queue(dev, 0)) || !skb->xmit_more) @@ -781,6 +784,10 @@ static int fe_start_xmit(struct sk_buff *skb, struct net_device *dev) tx_num = fe_cal_txd_req(skb); if (unlikely(fe_empty_txd(ring) <= tx_num)) { + if (skb->xmit_more) { + pr_info_ratelimited("SKB XMIT_MORE\n"); + fe_reg_w32(ring->tx_next_idx, FE_REG_TX_CTX_IDX0); + } netif_stop_queue(dev); netif_err(priv, tx_queued, dev, "Tx Ring full when queue awake!\n"); @@ -967,11 +974,12 @@ static int fe_poll(struct napi_struct *napi, int budget) status_reg = FE_REG_FE_INT_STATUS; } + /* if (status & tx_intr) { fe_reg_w32(tx_intr, FE_REG_FE_INT_STATUS); tx_done = fe_poll_tx(priv); status = fe_reg_r32(FE_REG_FE_INT_STATUS); - } + }*/ if (status & rx_intr) rx_done = fe_poll_rx(napi, budget, priv, rx_intr); @@ -1042,18 +1050,30 @@ static irqreturn_t fe_handle_irq(int irq, void *dev) { struct fe_priv *priv = netdev_priv(dev); u32 status, int_mask; + u32 tx_intr, rx_intr; + + tx_intr = priv->soc->tx_int; + rx_intr = priv->soc->rx_int; status = fe_reg_r32(FE_REG_FE_INT_STATUS); if (unlikely(!status)) return IRQ_NONE; - int_mask = (priv->soc->rx_int | priv->soc->tx_int); + int_mask = tx_intr | rx_intr; if (likely(status & int_mask)) { - if (likely(napi_schedule_prep(&priv->rx_napi))) { - fe_int_disable(int_mask); - __napi_schedule(&priv->rx_napi); - } + if (status & rx_intr) { + if (likely(napi_schedule_prep(&priv->rx_napi))) { + fe_int_disable(rx_intr); + __napi_schedule(&priv->rx_napi); + } + } + + if (status & tx_intr) { + fe_reg_w32(tx_intr, FE_REG_FE_INT_STATUS); + fe_poll_tx(priv); + fe_int_enable(tx_intr); + } } else { fe_reg_w32(status, FE_REG_FE_INT_STATUS); } @@ -1549,7 +1569,7 @@ static int fe_probe(struct platform_device *pdev) platform_set_drvdata(pdev, netdev); - netif_info(priv, probe, netdev, "mediatek frame engine at 0x%08lx, irq %d\n", + netif_info(priv, probe, netdev, "mediatek frame engine at 0x%08lx, irq %d - patched\n", netdev->base_addr, netdev->irq); return 0; -- 2.11.0