Message ID | 20200211101741.17350-1-ynezz@true.cz |
---|---|
State | Accepted |
Delegated to: | Petr Štetiar |
Headers | show |
Series | [OpenWrt-Devel] ramips: gsw_mt7621: disable PORT 5 MAC RX/TX flow control by default | expand |
Sent from my iPhone > On Feb 11, 2020, at 2:17 AM, Petr Štetiar <ynezz@true.cz> wrote: > > Looking at the current upstream driver implementation, it seems like the > TX/RX flow control is enabled only if the flow control pause option is > resolved from the device/link partner advertisements (or otherwise set). > > On the other hand, our current in-tree driver force enables TX/RX > flow control by default, thus possibly leading to TX timeouts if the > other end sends pause frames (which are not properly handled?): > > WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:320 dev_watchdog+0x1ac/0x324 > NETDEV WATCHDOG: eth0 (mtk_soc_eth): transmit queue 0 timed out > > Disabling the flow control on PORT 5 MAC seems to fix this issues as the > pause frames are then filtered out. While at it, I'm removing the if > condition completely as suggested, since this code is run only on mt7621 > SoC, so there is no need to check for the silicon revisions. Sounds good. > > Ref: https://lists.openwrt.org/pipermail/openwrt-devel/2017-November/009882.html > Ref: https://forum.openwrt.org/t/mtk-soc-eth-watchdog-timeout-after-r11573/50000/12 > Suggested-by: Felix Fietkau <nbd@nbd.name> > Reported-by: Rosen Penev <rosenp@gmail.com> > Signed-off-by: Petr Štetiar <ynezz@true.cz> > --- > .../drivers/net/ethernet/mediatek/gsw_mt7621.c | 12 +++--------- > 1 file changed, 3 insertions(+), 9 deletions(-) > > diff --git a/target/linux/ramips/files-4.14/drivers/net/ethernet/mediatek/gsw_mt7621.c b/target/linux/ramips/files-4.14/drivers/net/ethernet/mediatek/gsw_mt7621.c > index 89be23900738..232bcd8cf4ea 100644 > --- a/target/linux/ramips/files-4.14/drivers/net/ethernet/mediatek/gsw_mt7621.c > +++ b/target/linux/ramips/files-4.14/drivers/net/ethernet/mediatek/gsw_mt7621.c > @@ -98,15 +98,9 @@ static void mt7621_hw_init(struct mt7620_gsw *gsw, struct device_node *np) > mt7530_mdio_w32(gsw, 0x7000, 0x3); > usleep_range(10, 20); > > - if ((rt_sysc_r32(SYSC_REG_CHIP_REV_ID) & 0xFFFF) == 0x0101) { > - /* (GE1, Force 1000M/FD, FC ON, MAX_RX_LENGTH 1536) */ > - mtk_switch_w32(gsw, 0x2305e30b, GSW_REG_MAC_P0_MCR); > - mt7530_mdio_w32(gsw, 0x3600, 0x5e30b); > - } else { > - /* (GE1, Force 1000M/FD, FC ON, MAX_RX_LENGTH 1536) */ > - mtk_switch_w32(gsw, 0x2305e33b, GSW_REG_MAC_P0_MCR); > - mt7530_mdio_w32(gsw, 0x3600, 0x5e33b); > - } > + /* (GE1, Force 1000M/FD, FC OFF, MAX_RX_LENGTH 1536) */ > + mtk_switch_w32(gsw, 0x2305e30b, GSW_REG_MAC_P0_MCR); > + mt7530_mdio_w32(gsw, 0x3600, 0x5e30b); > > /* (GE2, Link down) */ > mtk_switch_w32(gsw, 0x8000, GSW_REG_MAC_P1_MCR);
The sender domain has a DMARC Reject/Quarantine policy which disallows sending mailing list messages using the original "From" header. To mitigate this problem, the original message has been wrapped automatically by the mailing list software. Hi Petr, Quoting Petr Štetiar <ynezz@true.cz>: > Looking at the current upstream driver implementation, it seems like the > TX/RX flow control is enabled only if the flow control pause option is > resolved from the device/link partner advertisements (or otherwise set). > With upstream you mean mainline 5.5 kernel? In the DTS the CPU port is defined as a fixed-link with pause enabled. So the pause bits are always resolved/set. I can't tell if the mainline kernel driver encounters the same issue. I don't use my device daily. I still wonder if there is any race condition in the TX path/handling. I see in the mainline kernel version they are using the same lock in tx_map and tx_poll. And also checking the DMA owner bit in the descriptor. So I think that the following is happening. The hardware can't send the frames fast enough because of the PAUSE frames, maybe to a slower device? The CPU is filling the tx ring faster then the hardware sending it and eventually CPU overtakes the DMA head. Which causes an issue/race/deadlock. By disabling the PAUSE/flow control, it is less likely that is overtake is going happen. But may result in more packet loss. > On the other hand, our current in-tree driver force enables TX/RX > flow control by default, thus possibly leading to TX timeouts if the > other end sends pause frames (which are not properly handled?): > > WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:320 > dev_watchdog+0x1ac/0x324 > NETDEV WATCHDOG: eth0 (mtk_soc_eth): transmit queue 0 timed out > > Disabling the flow control on PORT 5 MAC seems to fix this issues as the > pause frames are then filtered out. While at it, I'm removing the if > condition completely as suggested, since this code is run only on mt7621 > SoC, so there is no need to check for the silicon revisions. > Port 6 is connected to the first GMAC of the SOC, not port 5. So it should be PORT 6 in your description also you change the first GMAC settings, maybe mention that too. Greats, René > Ref: > https://lists.openwrt.org/pipermail/openwrt-devel/2017-November/009882.html > Ref: > https://forum.openwrt.org/t/mtk-soc-eth-watchdog-timeout-after-r11573/50000/12 > Suggested-by: Felix Fietkau <nbd@nbd.name> > Reported-by: Rosen Penev <rosenp@gmail.com> > Signed-off-by: Petr Štetiar <ynezz@true.cz> > --- > .../drivers/net/ethernet/mediatek/gsw_mt7621.c | 12 +++--------- > 1 file changed, 3 insertions(+), 9 deletions(-) > > diff --git > a/target/linux/ramips/files-4.14/drivers/net/ethernet/mediatek/gsw_mt7621.c > b/target/linux/ramips/files-4.14/drivers/net/ethernet/mediatek/gsw_mt7621.c > index 89be23900738..232bcd8cf4ea 100644 > --- > a/target/linux/ramips/files-4.14/drivers/net/ethernet/mediatek/gsw_mt7621.c > +++ > b/target/linux/ramips/files-4.14/drivers/net/ethernet/mediatek/gsw_mt7621.c > @@ -98,15 +98,9 @@ static void mt7621_hw_init(struct mt7620_gsw > *gsw, struct device_node *np) > mt7530_mdio_w32(gsw, 0x7000, 0x3); > usleep_range(10, 20); > > - if ((rt_sysc_r32(SYSC_REG_CHIP_REV_ID) & 0xFFFF) == 0x0101) { > - /* (GE1, Force 1000M/FD, FC ON, MAX_RX_LENGTH 1536) */ > - mtk_switch_w32(gsw, 0x2305e30b, GSW_REG_MAC_P0_MCR); > - mt7530_mdio_w32(gsw, 0x3600, 0x5e30b); > - } else { > - /* (GE1, Force 1000M/FD, FC ON, MAX_RX_LENGTH 1536) */ > - mtk_switch_w32(gsw, 0x2305e33b, GSW_REG_MAC_P0_MCR); > - mt7530_mdio_w32(gsw, 0x3600, 0x5e33b); > - } > + /* (GE1, Force 1000M/FD, FC OFF, MAX_RX_LENGTH 1536) */ > + mtk_switch_w32(gsw, 0x2305e30b, GSW_REG_MAC_P0_MCR); > + mt7530_mdio_w32(gsw, 0x3600, 0x5e30b); > > /* (GE2, Link down) */ > mtk_switch_w32(gsw, 0x8000, GSW_REG_MAC_P1_MCR); > > _______________________________________________ > openwrt-devel mailing list > openwrt-devel@lists.openwrt.org > https://lists.openwrt.org/mailman/listinfo/openwrt-devel
René van Dorst via openwrt-devel <openwrt-devel@lists.openwrt.org> [2020-02-11 15:59:52]: [adding Kristian to the Cc: loop] > > Looking at the current upstream driver implementation, it seems like the > > TX/RX flow control is enabled only if the flow control pause option is > > resolved from the device/link partner advertisements (or otherwise set). > > > > With upstream you mean mainline 5.5 kernel? Yes, 5.6-rc1 > In the DTS the CPU port is defined as a fixed-link with pause enabled. > So the pause bits are always resolved/set. I see[1] port@6 with fixed-link, without pause property. > The hardware can't send the frames fast enough because of the PAUSE frames, > maybe to a slower device? The CPU is filling the tx ring faster then the > hardware sending it and eventually CPU overtakes the DMA head. Which causes an > issue/race/deadlock. Probably, I didn't tried to reproduce it or planning to do so. > > Disabling the flow control on PORT 5 MAC seems to fix this issues as the > > pause frames are then filtered out. While at it, I'm removing the if > > condition completely as suggested, since this code is run only on mt7621 > > SoC, so there is no need to check for the silicon revisions. > > > > Port 6 is connected to the first GMAC of the SOC, not port 5. > So it should be PORT 6 in your description also Ok, I took that "PMCR_P5/PORT 5 MAC Control Register" from MT7621_ProgrammingGuide_GSW_v01.pdf. Couldnt find anything about P6, it's quite confusing. 1. https://elixir.bootlin.com/linux/v5.6-rc1/source/drivers/staging/mt7621-dts/mt7621.dtsi#L489 -- ynezz
The sender domain has a DMARC Reject/Quarantine policy which disallows sending mailing list messages using the original "From" header. To mitigate this problem, the original message has been wrapped automatically by the mailing list software. Quoting Petr Štetiar <ynezz@true.cz>: > René van Dorst via openwrt-devel <openwrt-devel@lists.openwrt.org> > [2020-02-11 15:59:52]: > > [adding Kristian to the Cc: loop] > >> > Looking at the current upstream driver implementation, it seems like the >> > TX/RX flow control is enabled only if the flow control pause option is >> > resolved from the device/link partner advertisements (or otherwise set). >> > >> >> With upstream you mean mainline 5.5 kernel? > > Yes, 5.6-rc1 > >> In the DTS the CPU port is defined as a fixed-link with pause enabled. >> So the pause bits are always resolved/set. > > I see[1] port@6 with fixed-link, without pause property. Here is more information I wrote [1] about the mt7530 switch and it modes and also the mt7621 example below. I used the pause because AFAIK there was/is no pause issue. > >> The hardware can't send the frames fast enough because of the PAUSE frames, >> maybe to a slower device? The CPU is filling the tx ring faster then the >> hardware sending it and eventually CPU overtakes the DMA head. >> Which causes an >> issue/race/deadlock. > > Probably, I didn't tried to reproduce it or planning to do so. > I hope to reproduce it quick by reducing the TX ring size from 4096 to 256. >> > Disabling the flow control on PORT 5 MAC seems to fix this issues as the >> > pause frames are then filtered out. While at it, I'm removing the if >> > condition completely as suggested, since this code is run only on mt7621 >> > SoC, so there is no need to check for the silicon revisions. >> > >> >> Port 6 is connected to the first GMAC of the SOC, not port 5. >> So it should be PORT 6 in your description also > > Ok, I took that "PMCR_P5/PORT 5 MAC Control Register" from > MT7621_ProgrammingGuide_GSW_v01.pdf. Couldnt find anything about P6, it's > quite confusing. I totally understand that. Lack of complete documentation is an issue. P6 = MAC6 in the documentation. AFAIK all the MACs do have the same register layout. So MAC6 = PORT 6 and is connected to the first GMAC of the SOC. PORT 5 = MAC5 and can be used as 2nd CPU port or connect to an external phy. So MAC5, 2nd GMAC of the SOC and an external PHY share the same RGMII bus. Depending on GPIO/PORT settings they are connected to the BUS. See also mt7530.txt kernel doc for more info. > > 1. > https://elixir.bootlin.com/linux/v5.6-rc1/source/drivers/staging/mt7621-dts/mt7621.dtsi#L489 > > -- ynezz > > _______________________________________________ > openwrt-devel mailing list > openwrt-devel@lists.openwrt.org > https://lists.openwrt.org/mailman/listinfo/openwrt-devel [1]: https://github.com/torvalds/linux/blob/master/Documentation/devicetree/bindings/net/dsa/mt7530.txt#L38 Greats, René
The sender domain has a DMARC Reject/Quarantine policy which disallows sending mailing list messages using the original "From" header. To mitigate this problem, the original message has been wrapped automatically by the mailing list software. Quoting René van Dorst <opensource@vdorst.com>: > Quoting Petr Štetiar <ynezz@true.cz>: > >> René van Dorst via openwrt-devel <openwrt-devel@lists.openwrt.org> >> [2020-02-11 15:59:52]: >> >> [adding Kristian to the Cc: loop] >> >>>> Looking at the current upstream driver implementation, it seems like the >>>> TX/RX flow control is enabled only if the flow control pause option is >>>> resolved from the device/link partner advertisements (or otherwise set). >>>> >>> >>> With upstream you mean mainline 5.5 kernel? >> >> Yes, 5.6-rc1 >> >>> In the DTS the CPU port is defined as a fixed-link with pause enabled. >>> So the pause bits are always resolved/set. >> >> I see[1] port@6 with fixed-link, without pause property. > > Here is more information I wrote [1] about the mt7530 switch and it > modes and also > the mt7621 example below. I used the pause because AFAIK there > was/is no pause issue. > >> >>> The hardware can't send the frames fast enough because of the PAUSE frames, >>> maybe to a slower device? The CPU is filling the tx ring faster then the >>> hardware sending it and eventually CPU overtakes the DMA head. >>> Which causes an >>> issue/race/deadlock. >> >> Probably, I didn't tried to reproduce it or planning to do so. >> > > I hope to reproduce it quick by reducing the TX ring size from 4096 to 256. > Hi Petr, I thing I looking in the right direction. With this patch, reducing the tx_ring size to 4 * 4 = 16 packets, I can easily trigger a transmit timeout. Just create some traffic. rene@pc-rene:~/dev/openwrt$ git diff diff --git a/target/linux/ramips/files-4.14/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/target/linux/ramips/files-4.14/drivers/net/ethernet/mediatek/mtk_eth_soc.c index c6f4d90193b3..09d5e75ec0b0 100644 --- a/target/linux/ramips/files-4.14/drivers/net/ethernet/mediatek/mtk_eth_soc.c +++ b/target/linux/ramips/files-4.14/drivers/net/ethernet/mediatek/mtk_eth_soc.c @@ -1658,7 +1658,7 @@ static int fe_probe(struct platform_device *pdev) priv->msg_enable = netif_msg_init(fe_msg_level, FE_DEFAULT_MSG_ENABLE); priv->rx_ring.frag_size = fe_max_frag_size(ETH_DATA_LEN); priv->rx_ring.rx_buf_size = fe_max_buf_size(priv->rx_ring.frag_size); - priv->tx_ring.tx_ring_size = NUM_DMA_DESC; + priv->tx_ring.tx_ring_size = 4; priv->rx_ring.rx_ring_size = NUM_DMA_DESC; INIT_WORK(&priv->pending_work, fe_pending_work); u64_stats_init(&priv->hw_stats->syncp); Greats, René >>>> Disabling the flow control on PORT 5 MAC seems to fix this issues as the >>>> pause frames are then filtered out. While at it, I'm removing the if >>>> condition completely as suggested, since this code is run only on mt7621 >>>> SoC, so there is no need to check for the silicon revisions. >>>> >>> >>> Port 6 is connected to the first GMAC of the SOC, not port 5. >>> So it should be PORT 6 in your description also >> >> Ok, I took that "PMCR_P5/PORT 5 MAC Control Register" from >> MT7621_ProgrammingGuide_GSW_v01.pdf. Couldnt find anything about P6, it's >> quite confusing. > > I totally understand that. Lack of complete documentation is an issue. > > P6 = MAC6 in the documentation. AFAIK all the MACs do have the same register > layout. So MAC6 = PORT 6 and is connected to the first GMAC of the SOC. > PORT 5 = MAC5 and can be used as 2nd CPU port or connect to an external phy. > So MAC5, 2nd GMAC of the SOC and an external PHY share the same RGMII bus. > Depending on GPIO/PORT settings they are connected to the BUS. See > also mt7530.txt > kernel doc for more info. > >> >> 1. >> https://elixir.bootlin.com/linux/v5.6-rc1/source/drivers/staging/mt7621-dts/mt7621.dtsi#L489 >> >> -- ynezz >> >> _______________________________________________ >> openwrt-devel mailing list >> openwrt-devel@lists.openwrt.org >> https://lists.openwrt.org/mailman/listinfo/openwrt-devel > > [1]: > https://github.com/torvalds/linux/blob/master/Documentation/devicetree/bindings/net/dsa/mt7530.txt#L38 > > Greats, > > René
The sender domain has a DMARC Reject/Quarantine policy which disallows sending mailing list messages using the original "From" header. To mitigate this problem, the original message has been wrapped automatically by the mailing list software. Quoting René van Dorst <opensource@vdorst.com>: > Quoting René van Dorst <opensource@vdorst.com>: > >> Quoting Petr Štetiar <ynezz@true.cz>: >> >>> René van Dorst via openwrt-devel <openwrt-devel@lists.openwrt.org> >>> [2020-02-11 15:59:52]: >>> >>> [adding Kristian to the Cc: loop] >>> >>>>> Looking at the current upstream driver implementation, it seems like the >>>>> TX/RX flow control is enabled only if the flow control pause option is >>>>> resolved from the device/link partner advertisements (or otherwise set). >>>>> >>>> >>>> With upstream you mean mainline 5.5 kernel? >>> >>> Yes, 5.6-rc1 >>> >>>> In the DTS the CPU port is defined as a fixed-link with pause enabled. >>>> So the pause bits are always resolved/set. >>> >>> I see[1] port@6 with fixed-link, without pause property. >> >> Here is more information I wrote [1] about the mt7530 switch and it >> modes and also >> the mt7621 example below. I used the pause because AFAIK there >> was/is no pause issue. >> >>> >>>> The hardware can't send the frames fast enough because of the >>>> PAUSE frames, >>>> maybe to a slower device? The CPU is filling the tx ring faster then the >>>> hardware sending it and eventually CPU overtakes the DMA head. >>>> Which causes an >>>> issue/race/deadlock. >>> >>> Probably, I didn't tried to reproduce it or planning to do so. >>> >> >> I hope to reproduce it quick by reducing the TX ring size from 4096 to 256. >> > > Hi Petr, > > I thing I looking in the right direction. > With this patch, reducing the tx_ring size to 4 * 4 = 16 packets, I > can easily > trigger a transmit timeout. > Just create some traffic. > Hi Petr, Although it triggers the timeout, it seems the driver/functions don't expected such small dma sizes. With some extra debug code added, I see that only the first 2 entries are filled and a timeout hits. By increasing the size to 8*4=32 packets the drivers works as expected. I tried to flood the device with packets. Iperf3 (LAN->WAN) through the device with and in parallel iperf3 from device to LAN. After more then +500GBytes it was still going. So I am not sure about my theory. Maybe we could add some extra debug code in the timeout(). FE can set the flag "DMA DONE" for every packet that was send. Debug code should show that bit for every entry. Maybe this gives a bit more info. Greats, René > > rene@pc-rene:~/dev/openwrt$ git diff > diff --git > a/target/linux/ramips/files-4.14/drivers/net/ethernet/mediatek/mtk_eth_soc.c > b/target/linux/ramips/files-4.14/drivers/net/ethernet/mediatek/mtk_eth_soc.c > index c6f4d90193b3..09d5e75ec0b0 100644 > --- > a/target/linux/ramips/files-4.14/drivers/net/ethernet/mediatek/mtk_eth_soc.c > +++ > b/target/linux/ramips/files-4.14/drivers/net/ethernet/mediatek/mtk_eth_soc.c > @@ -1658,7 +1658,7 @@ static int fe_probe(struct platform_device *pdev) > priv->msg_enable = netif_msg_init(fe_msg_level, > FE_DEFAULT_MSG_ENABLE); > priv->rx_ring.frag_size = fe_max_frag_size(ETH_DATA_LEN); > priv->rx_ring.rx_buf_size = fe_max_buf_size(priv->rx_ring.frag_size); > - priv->tx_ring.tx_ring_size = NUM_DMA_DESC; > + priv->tx_ring.tx_ring_size = 4; > priv->rx_ring.rx_ring_size = NUM_DMA_DESC; > INIT_WORK(&priv->pending_work, fe_pending_work); > u64_stats_init(&priv->hw_stats->syncp); > > Greats, > > René > >>>>> Disabling the flow control on PORT 5 MAC seems to fix this issues as the >>>>> pause frames are then filtered out. While at it, I'm removing the if >>>>> condition completely as suggested, since this code is run only on mt7621 >>>>> SoC, so there is no need to check for the silicon revisions. >>>>> >>>> >>>> Port 6 is connected to the first GMAC of the SOC, not port 5. >>>> So it should be PORT 6 in your description also >>> >>> Ok, I took that "PMCR_P5/PORT 5 MAC Control Register" from >>> MT7621_ProgrammingGuide_GSW_v01.pdf. Couldnt find anything about P6, it's >>> quite confusing. >> >> I totally understand that. Lack of complete documentation is an issue. >> >> P6 = MAC6 in the documentation. AFAIK all the MACs do have the same register >> layout. So MAC6 = PORT 6 and is connected to the first GMAC of the SOC. >> PORT 5 = MAC5 and can be used as 2nd CPU port or connect to an external phy. >> So MAC5, 2nd GMAC of the SOC and an external PHY share the same RGMII bus. >> Depending on GPIO/PORT settings they are connected to the BUS. See >> also mt7530.txt >> kernel doc for more info. >> >>> >>> 1. >>> https://elixir.bootlin.com/linux/v5.6-rc1/source/drivers/staging/mt7621-dts/mt7621.dtsi#L489 >>> >>> -- ynezz >>> >>> _______________________________________________ >>> openwrt-devel mailing list >>> openwrt-devel@lists.openwrt.org >>> https://lists.openwrt.org/mailman/listinfo/openwrt-devel >> >> [1]: >> https://github.com/torvalds/linux/blob/master/Documentation/devicetree/bindings/net/dsa/mt7530.txt#L38 >> >> Greats, >> >> René
René van Dorst via openwrt-devel <openwrt-devel@lists.openwrt.org> [2020-02-13 15:14:09]: Hi, > Maybe we could add some extra debug code in the timeout(). FE can set the > flag "DMA DONE" for every packet that was send. Debug code should show that > bit for every entry. Maybe this gives a bit more info. I'm all in to fix this properly as that proposed patch still feels like workaround to me. There seems to be a lot of users interested/affected/able to test it on the forum[1], so it would be really nice from you if you could find some spare time to fix it. I mean prepare some debug/fix patch, post it on that forum topic, wait for the response, iterate... Thanks for taking care! 1. https://forum.openwrt.org/t/mtk-soc-eth-watchdog-timeout-after-r11573 -- ynezz
Hi everyone, I am sorry for my late reply to this thread. My email provider flagged it as spam, so I only saw the conversation now. It seems that you have reached a conclusion on how to proceed, but I thought I should anyway share my notes/observations on this issue (in case they can be useful). My employer has a large number of Mediatek-based (mt7620 and mt7621) routers in production. Most routers have a minimum of two internet connections - one fixed and one using mobile broadband. Some time in 2017 we started receiving reports from a few customers that the switch would stop working. The link was up, but no data would go through. Looking at the logs, we could always see the "TX timeout" error message and we started to look for a cause. We quickly ruled out any kind of crash, as the LTE was still up and wifi worked fine. After getting a few of these reports, we started looking for things that were common between the different installations. We struggled to find any, there were all sorts of devices connected to the different ports on the routers. The only thing the different cases had in common, was that the problem disappeared when whatever was connected to the WAN port was disconnected. However, again, the equipment that provided the fixed connection came from all sorts of vendors. After scratching our heads for a while and not getting anywhere, I asked here on the mailing list and was told that restarting networking should at least make the switch works fine again. We added a watchdog doing exactly that when the TX timeout message would appear. Restarting networking improved the situation considerably, but the switch would still sometimes get stuck and never recover. This triggered us to make a second attempt at recreating the problem. Our test was the same as what Rene described. We assumed the problem had something to do with sending large amounts of traffic and at a high speed, so we used iperf3 as a traffic generator and sent traffic between different machines connected to the switch. One of these machines were quite unstable and prone to crash, and we noticed that whenever that machine would crash the TX timeout issue would trigger and no traffic would pass through the switch. A normal packet capture didn't reveal anything interesting, but connecting a network tap did. Looking at the packets captured from the tap, we could see a flood of pause frames from the crashed machine. When this flood occurred, the switch stopped transmitting packets on all the ports and not just the one that the crashed machine was connected to. This caught us by surprise, but doing some research it seems to be a common behavior among "normal" switches. Also, if we waited long enough, the switch would never recover. After discovering that pause frames seemed to be at least one trigger for TX timeout, we added support to the driver for enabling/disabling flow control on each of the ports + an init script that does the disabling. Since we deployed this change on our routers, we have not had a single report about switches that stop working. We do sometimes still see the "TX timeout" error, but it is no longer critical. We never tried to disable flow control on the CPU-port only, which seems like a more elegant approach than disabling each port individually. I do agree that disabling pause frames is more a work-around than a solution, but it has at least eradicated the problem for us. I never got around to submitting our patch, but if anyone would find it useful I can do it quite soon. BR, Kristian
diff --git a/target/linux/ramips/files-4.14/drivers/net/ethernet/mediatek/gsw_mt7621.c b/target/linux/ramips/files-4.14/drivers/net/ethernet/mediatek/gsw_mt7621.c index 89be23900738..232bcd8cf4ea 100644 --- a/target/linux/ramips/files-4.14/drivers/net/ethernet/mediatek/gsw_mt7621.c +++ b/target/linux/ramips/files-4.14/drivers/net/ethernet/mediatek/gsw_mt7621.c @@ -98,15 +98,9 @@ static void mt7621_hw_init(struct mt7620_gsw *gsw, struct device_node *np) mt7530_mdio_w32(gsw, 0x7000, 0x3); usleep_range(10, 20); - if ((rt_sysc_r32(SYSC_REG_CHIP_REV_ID) & 0xFFFF) == 0x0101) { - /* (GE1, Force 1000M/FD, FC ON, MAX_RX_LENGTH 1536) */ - mtk_switch_w32(gsw, 0x2305e30b, GSW_REG_MAC_P0_MCR); - mt7530_mdio_w32(gsw, 0x3600, 0x5e30b); - } else { - /* (GE1, Force 1000M/FD, FC ON, MAX_RX_LENGTH 1536) */ - mtk_switch_w32(gsw, 0x2305e33b, GSW_REG_MAC_P0_MCR); - mt7530_mdio_w32(gsw, 0x3600, 0x5e33b); - } + /* (GE1, Force 1000M/FD, FC OFF, MAX_RX_LENGTH 1536) */ + mtk_switch_w32(gsw, 0x2305e30b, GSW_REG_MAC_P0_MCR); + mt7530_mdio_w32(gsw, 0x3600, 0x5e30b); /* (GE2, Link down) */ mtk_switch_w32(gsw, 0x8000, GSW_REG_MAC_P1_MCR);
Looking at the current upstream driver implementation, it seems like the TX/RX flow control is enabled only if the flow control pause option is resolved from the device/link partner advertisements (or otherwise set). On the other hand, our current in-tree driver force enables TX/RX flow control by default, thus possibly leading to TX timeouts if the other end sends pause frames (which are not properly handled?): WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:320 dev_watchdog+0x1ac/0x324 NETDEV WATCHDOG: eth0 (mtk_soc_eth): transmit queue 0 timed out Disabling the flow control on PORT 5 MAC seems to fix this issues as the pause frames are then filtered out. While at it, I'm removing the if condition completely as suggested, since this code is run only on mt7621 SoC, so there is no need to check for the silicon revisions. Ref: https://lists.openwrt.org/pipermail/openwrt-devel/2017-November/009882.html Ref: https://forum.openwrt.org/t/mtk-soc-eth-watchdog-timeout-after-r11573/50000/12 Suggested-by: Felix Fietkau <nbd@nbd.name> Reported-by: Rosen Penev <rosenp@gmail.com> Signed-off-by: Petr Štetiar <ynezz@true.cz> --- .../drivers/net/ethernet/mediatek/gsw_mt7621.c | 12 +++--------- 1 file changed, 3 insertions(+), 9 deletions(-)