diff mbox series

[1/1] net: ftgmac100: Fix Aspeed ast2600 TX hang issue

Message ID 20201014060632.16085-2-dylan_hung@aspeedtech.com
State Not Applicable, archived
Headers show
Series Fix Aspeed ast2600 MAC TX hang | expand

Commit Message

Dylan Hung Oct. 14, 2020, 6:06 a.m. UTC
The new HW arbitration feature on Aspeed ast2600 will cause MAC TX to
hang when handling scatter-gather DMA.  Disable the problematic feature
by setting MAC register 0x58 bit28 and bit27.

Signed-off-by: Dylan Hung <dylan_hung@aspeedtech.com>
---
 drivers/net/ethernet/faraday/ftgmac100.c | 5 +++++
 drivers/net/ethernet/faraday/ftgmac100.h | 8 ++++++++
 2 files changed, 13 insertions(+)

Comments

Joel Stanley Oct. 14, 2020, 6:41 a.m. UTC | #1
On Wed, 14 Oct 2020 at 06:07, Dylan Hung <dylan_hung@aspeedtech.com> wrote:
>
> The new HW arbitration feature on Aspeed ast2600 will cause MAC TX to
> hang when handling scatter-gather DMA.  Disable the problematic feature
> by setting MAC register 0x58 bit28 and bit27.

Hi Dylan,

What are the symptoms of this issue? We are seeing this on our systems:

[29376.090637] WARNING: CPU: 0 PID: 9 at net/sched/sch_generic.c:442
dev_watchdog+0x2f0/0x2f4
[29376.099898] NETDEV WATCHDOG: eth0 (ftgmac100): transmit queue 0 timed out

> Signed-off-by: Dylan Hung <dylan_hung@aspeedtech.com>

This fixes support for the ast2600, so we can put:

Fixes: 39bfab8844a0 ("net: ftgmac100: Add support for DT phy-handle property")

Reviewed-by: Joel Stanley <joel@jms.id.au>

> ---
>  drivers/net/ethernet/faraday/ftgmac100.c | 5 +++++
>  drivers/net/ethernet/faraday/ftgmac100.h | 8 ++++++++
>  2 files changed, 13 insertions(+)
>
> diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c
> index 87236206366f..00024dd41147 100644
> --- a/drivers/net/ethernet/faraday/ftgmac100.c
> +++ b/drivers/net/ethernet/faraday/ftgmac100.c
> @@ -1817,6 +1817,11 @@ static int ftgmac100_probe(struct platform_device *pdev)
>                 priv->rxdes0_edorr_mask = BIT(30);
>                 priv->txdes0_edotr_mask = BIT(30);
>                 priv->is_aspeed = true;
> +               /* Disable ast2600 problematic HW arbitration */
> +               if (of_device_is_compatible(np, "aspeed,ast2600-mac")) {
> +                       iowrite32(FTGMAC100_TM_DEFAULT,
> +                                 priv->base + FTGMAC100_OFFSET_TM);
> +               }
>         } else {
>                 priv->rxdes0_edorr_mask = BIT(15);
>                 priv->txdes0_edotr_mask = BIT(15);
> diff --git a/drivers/net/ethernet/faraday/ftgmac100.h b/drivers/net/ethernet/faraday/ftgmac100.h
> index e5876a3fda91..63b3e02fab16 100644
> --- a/drivers/net/ethernet/faraday/ftgmac100.h
> +++ b/drivers/net/ethernet/faraday/ftgmac100.h
> @@ -169,6 +169,14 @@
>  #define FTGMAC100_MACCR_FAST_MODE      (1 << 19)
>  #define FTGMAC100_MACCR_SW_RST         (1 << 31)
>
> +/*
> + * test mode control register
> + */
> +#define FTGMAC100_TM_RQ_TX_VALID_DIS (1 << 28)
> +#define FTGMAC100_TM_RQ_RR_IDLE_PREV (1 << 27)
> +#define FTGMAC100_TM_DEFAULT                                                   \
> +       (FTGMAC100_TM_RQ_TX_VALID_DIS | FTGMAC100_TM_RQ_RR_IDLE_PREV)

Will aspeed issue an updated datasheet with this register documented?


> +
>  /*
>   * PHY control register
>   */
> --
> 2.17.1
>
Dylan Hung Oct. 14, 2020, 7:58 a.m. UTC | #2
Hi Joel,

> -----Original Message-----
> From: Joel Stanley [mailto:joel@jms.id.au]
> Sent: Wednesday, October 14, 2020 2:41 PM
> To: Dylan Hung <dylan_hung@aspeedtech.com>
> Cc: David S . Miller <davem@davemloft.net>; Jakub Kicinski
> <kuba@kernel.org>; netdev@vger.kernel.org; Linux Kernel Mailing List
> <linux-kernel@vger.kernel.org>; Po-Yu Chuang <ratbert@faraday-tech.com>;
> linux-aspeed <linux-aspeed@lists.ozlabs.org>; OpenBMC Maillist
> <openbmc@lists.ozlabs.org>; BMC-SW <BMC-SW@aspeedtech.com>
> Subject: Re: [PATCH 1/1] net: ftgmac100: Fix Aspeed ast2600 TX hang issue
> 
> On Wed, 14 Oct 2020 at 06:07, Dylan Hung <dylan_hung@aspeedtech.com>
> wrote:
> >
> > The new HW arbitration feature on Aspeed ast2600 will cause MAC TX to
> > hang when handling scatter-gather DMA.  Disable the problematic
> > feature by setting MAC register 0x58 bit28 and bit27.
> 
> Hi Dylan,
> 
> What are the symptoms of this issue? We are seeing this on our systems:
> 
> [29376.090637] WARNING: CPU: 0 PID: 9 at net/sched/sch_generic.c:442
> dev_watchdog+0x2f0/0x2f4
> [29376.099898] NETDEV WATCHDOG: eth0 (ftgmac100): transmit queue 0
> timed out
> 

May I know your soc version? This issue happens on ast2600 version A1.  The registers to fix this issue are meaningless/reserved on A0 chip, so it is okay to set them on either A0 or A1.
I was encountering this issue when I was running the iperf TX test.  The symptom is the TX descriptors are consumed, but no complete packet is sent out.

> > Signed-off-by: Dylan Hung <dylan_hung@aspeedtech.com>
> 
> This fixes support for the ast2600, so we can put:
> 
> Fixes: 39bfab8844a0 ("net: ftgmac100: Add support for DT phy-handle
> property")
> 
> Reviewed-by: Joel Stanley <joel@jms.id.au>
> 
> > ---
> >  drivers/net/ethernet/faraday/ftgmac100.c | 5 +++++
> > drivers/net/ethernet/faraday/ftgmac100.h | 8 ++++++++
> >  2 files changed, 13 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/faraday/ftgmac100.c
> > b/drivers/net/ethernet/faraday/ftgmac100.c
> > index 87236206366f..00024dd41147 100644
> > --- a/drivers/net/ethernet/faraday/ftgmac100.c
> > +++ b/drivers/net/ethernet/faraday/ftgmac100.c
> > @@ -1817,6 +1817,11 @@ static int ftgmac100_probe(struct
> platform_device *pdev)
> >                 priv->rxdes0_edorr_mask = BIT(30);
> >                 priv->txdes0_edotr_mask = BIT(30);
> >                 priv->is_aspeed = true;
> > +               /* Disable ast2600 problematic HW arbitration */
> > +               if (of_device_is_compatible(np, "aspeed,ast2600-mac"))
> {
> > +                       iowrite32(FTGMAC100_TM_DEFAULT,
> > +                                 priv->base +
> FTGMAC100_OFFSET_TM);
> > +               }
> >         } else {
> >                 priv->rxdes0_edorr_mask = BIT(15);
> >                 priv->txdes0_edotr_mask = BIT(15); diff --git
> > a/drivers/net/ethernet/faraday/ftgmac100.h
> > b/drivers/net/ethernet/faraday/ftgmac100.h
> > index e5876a3fda91..63b3e02fab16 100644
> > --- a/drivers/net/ethernet/faraday/ftgmac100.h
> > +++ b/drivers/net/ethernet/faraday/ftgmac100.h
> > @@ -169,6 +169,14 @@
> >  #define FTGMAC100_MACCR_FAST_MODE      (1 << 19)
> >  #define FTGMAC100_MACCR_SW_RST         (1 << 31)
> >
> > +/*
> > + * test mode control register
> > + */
> > +#define FTGMAC100_TM_RQ_TX_VALID_DIS (1 << 28) #define
> > +FTGMAC100_TM_RQ_RR_IDLE_PREV (1 << 27)
> > +#define FTGMAC100_TM_DEFAULT
> \
> > +       (FTGMAC100_TM_RQ_TX_VALID_DIS |
> FTGMAC100_TM_RQ_RR_IDLE_PREV)
> 
> Will aspeed issue an updated datasheet with this register documented?
> 
> 
> > +
> >  /*
> >   * PHY control register
> >   */
> > --
> > 2.17.1
> >
Joel Stanley Oct. 14, 2020, 10:31 p.m. UTC | #3
On Wed, 14 Oct 2020 at 13:32, Dylan Hung <dylan_hung@aspeedtech.com> wrote:
> > > The new HW arbitration feature on Aspeed ast2600 will cause MAC TX to
> > > hang when handling scatter-gather DMA.  Disable the problematic
> > > feature by setting MAC register 0x58 bit28 and bit27.
> >
> > Hi Dylan,
> >
> > What are the symptoms of this issue? We are seeing this on our systems:
> >
> > [29376.090637] WARNING: CPU: 0 PID: 9 at net/sched/sch_generic.c:442
> > dev_watchdog+0x2f0/0x2f4
> > [29376.099898] NETDEV WATCHDOG: eth0 (ftgmac100): transmit queue 0
> > timed out
> >
>
> May I know your soc version? This issue happens on ast2600 version A1.  The registers to fix this issue are meaningless/reserved on A0 chip, so it is okay to set them on either A0 or A1.

We are running the A1. All of our A0 parts have been replaced with A1.

> I was encountering this issue when I was running the iperf TX test.  The symptom is the TX descriptors are consumed, but no complete packet is sent out.

What parameters are you using for iperf? I did a lot of testing with
iperf3 (and stress-ng running at the same time) and couldn't reproduce
the error.

We could only reproduce it when performing other functions, such as
debugging/booting the host processor.

> > > +/*
> > > + * test mode control register
> > > + */
> > > +#define FTGMAC100_TM_RQ_TX_VALID_DIS (1 << 28) #define
> > > +FTGMAC100_TM_RQ_RR_IDLE_PREV (1 << 27)
> > > +#define FTGMAC100_TM_DEFAULT
> > \
> > > +       (FTGMAC100_TM_RQ_TX_VALID_DIS |
> > FTGMAC100_TM_RQ_RR_IDLE_PREV)
> >
> > Will aspeed issue an updated datasheet with this register documented?

Did you see this question?

Cheers,

Joel
Dylan Hung Oct. 15, 2020, 1:49 a.m. UTC | #4
> -----Original Message-----
> From: Joel Stanley [mailto:joel@jms.id.au]
> Sent: Thursday, October 15, 2020 6:31 AM
> To: Dylan Hung <dylan_hung@aspeedtech.com>
> Cc: David S . Miller <davem@davemloft.net>; Jakub Kicinski
> <kuba@kernel.org>; netdev@vger.kernel.org; Linux Kernel Mailing List
> <linux-kernel@vger.kernel.org>; Po-Yu Chuang <ratbert@faraday-tech.com>;
> linux-aspeed <linux-aspeed@lists.ozlabs.org>; OpenBMC Maillist
> <openbmc@lists.ozlabs.org>; BMC-SW <BMC-SW@aspeedtech.com>
> Subject: Re: [PATCH 1/1] net: ftgmac100: Fix Aspeed ast2600 TX hang issue
> 
> On Wed, 14 Oct 2020 at 13:32, Dylan Hung <dylan_hung@aspeedtech.com>
> wrote:
> > > > The new HW arbitration feature on Aspeed ast2600 will cause MAC TX
> > > > to hang when handling scatter-gather DMA.  Disable the problematic
> > > > feature by setting MAC register 0x58 bit28 and bit27.
> > >
> > > Hi Dylan,
> > >
> > > What are the symptoms of this issue? We are seeing this on our systems:
> > >
> > > [29376.090637] WARNING: CPU: 0 PID: 9 at net/sched/sch_generic.c:442
> > > dev_watchdog+0x2f0/0x2f4
> > > [29376.099898] NETDEV WATCHDOG: eth0 (ftgmac100): transmit queue 0
> > > timed out
> > >
> >
> > May I know your soc version? This issue happens on ast2600 version A1.
> The registers to fix this issue are meaningless/reserved on A0 chip, so it is
> okay to set them on either A0 or A1.
> 
> We are running the A1. All of our A0 parts have been replaced with A1.
> 
> > I was encountering this issue when I was running the iperf TX test.  The
> symptom is the TX descriptors are consumed, but no complete packet is sent
> out.
> 
> What parameters are you using for iperf? I did a lot of testing with
> iperf3 (and stress-ng running at the same time) and couldn't reproduce the
> error.
> 

I simply use "iperf -c <server ip>" on ast2600.  It is very easy to reproduce. I append the log below:
Noticed that this issue only happens when HW scatter-gather (NETIF_F_SG) is on.

[AST /]$ iperf3 -c 192.168.100.89
Connecting to host 192.168.100.89, port 5201
[  4] local 192.168.100.45 port 45346 connected to 192.168.100.89 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  44.8 MBytes   375 Mbits/sec    2   1.43 KBytes
[  4]   1.00-2.00   sec  0.00 Bytes  0.00 bits/sec    2   1.43 KBytes
[  4]   2.00-3.00   sec  0.00 Bytes  0.00 bits/sec    0   1.43 KBytes
[  4]   3.00-4.00   sec  0.00 Bytes  0.00 bits/sec    1   1.43 KBytes
[  4]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec    0   1.43 KBytes
^C[  4]   5.00-5.88   sec  0.00 Bytes  0.00 bits/sec    0   1.43 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-5.88   sec  44.8 MBytes  64.0 Mbits/sec    5             sender
[  4]   0.00-5.88   sec  0.00 Bytes  0.00 bits/sec                  receiver
iperf3: interrupt - the client has terminated

> We could only reproduce it when performing other functions, such as
> debugging/booting the host processor.
> 
Could it be another issue?

> > > > +/*
> > > > + * test mode control register
> > > > + */
> > > > +#define FTGMAC100_TM_RQ_TX_VALID_DIS (1 << 28) #define
> > > > +FTGMAC100_TM_RQ_RR_IDLE_PREV (1 << 27) #define
> > > > +FTGMAC100_TM_DEFAULT
> > > \
> > > > +       (FTGMAC100_TM_RQ_TX_VALID_DIS |
> > > FTGMAC100_TM_RQ_RR_IDLE_PREV)
> > >
> > > Will aspeed issue an updated datasheet with this register documented?
> 
> Did you see this question?
> 
Sorry, I missed this question.  Aspeed will update the datasheet accordingly.

> Cheers,
> 
> Joel
Joel Stanley Oct. 15, 2020, 2:32 a.m. UTC | #5
On Thu, 15 Oct 2020 at 01:49, Dylan Hung <dylan_hung@aspeedtech.com> wrote:
> > > I was encountering this issue when I was running the iperf TX test.  The
> > symptom is the TX descriptors are consumed, but no complete packet is sent
> > out.
> >
> > What parameters are you using for iperf? I did a lot of testing with
> > iperf3 (and stress-ng running at the same time) and couldn't reproduce the
> > error.
> >
>
> I simply use "iperf -c <server ip>" on ast2600.  It is very easy to reproduce. I append the log below:
> Noticed that this issue only happens when HW scatter-gather (NETIF_F_SG) is on.

Ok. This appears to be on by default in the
drivers/net/ethernet/faraday/ftgmac100.c:

        netdev->hw_features = NETIF_F_RXCSUM | NETIF_F_HW_CSUM |
                NETIF_F_GRO | NETIF_F_SG | NETIF_F_HW_VLAN_CTAG_RX |
                NETIF_F_HW_VLAN_CTAG_TX;

> [AST /]$ iperf3 -c 192.168.100.89
> Connecting to host 192.168.100.89, port 5201
> [  4] local 192.168.100.45 port 45346 connected to 192.168.100.89 port 5201
> [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
> [  4]   0.00-1.00   sec  44.8 MBytes   375 Mbits/sec    2   1.43 KBytes
> [  4]   1.00-2.00   sec  0.00 Bytes  0.00 bits/sec    2   1.43 KBytes
> [  4]   2.00-3.00   sec  0.00 Bytes  0.00 bits/sec    0   1.43 KBytes
> [  4]   3.00-4.00   sec  0.00 Bytes  0.00 bits/sec    1   1.43 KBytes
> [  4]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec    0   1.43 KBytes
> ^C[  4]   5.00-5.88   sec  0.00 Bytes  0.00 bits/sec    0   1.43 KBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bandwidth       Retr
> [  4]   0.00-5.88   sec  44.8 MBytes  64.0 Mbits/sec    5             sender
> [  4]   0.00-5.88   sec  0.00 Bytes  0.00 bits/sec                  receiver
> iperf3: interrupt - the client has terminated

I just realised my test machine must be on a 100Mbit network. I will
try testing on a gigabit network.

> > We could only reproduce it when performing other functions, such as
> > debugging/booting the host processor.
> >
> Could it be another issue?

I hope not! We have deployed your patch on our systems and I will let
you know if we see the bug again.

> > > > > +/*
> > > > > + * test mode control register
> > > > > + */
> > > > > +#define FTGMAC100_TM_RQ_TX_VALID_DIS (1 << 28) #define
> > > > > +FTGMAC100_TM_RQ_RR_IDLE_PREV (1 << 27) #define
> > > > > +FTGMAC100_TM_DEFAULT
> > > > \
> > > > > +       (FTGMAC100_TM_RQ_TX_VALID_DIS |
> > > > FTGMAC100_TM_RQ_RR_IDLE_PREV)
> > > >
> > > > Will aspeed issue an updated datasheet with this register documented?
> >
> > Did you see this question?
> >
> Sorry, I missed this question.  Aspeed will update the datasheet accordingly.

Thank you.
Jakub Kicinski Oct. 16, 2020, 10:37 p.m. UTC | #6
On Wed, 14 Oct 2020 14:06:32 +0800 Dylan Hung wrote:
> The new HW arbitration feature on Aspeed ast2600 will cause MAC TX to
> hang when handling scatter-gather DMA.  Disable the problematic feature
> by setting MAC register 0x58 bit28 and bit27.
> 
> Signed-off-by: Dylan Hung <dylan_hung@aspeedtech.com>

Applied, thank you.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c
index 87236206366f..00024dd41147 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -1817,6 +1817,11 @@  static int ftgmac100_probe(struct platform_device *pdev)
 		priv->rxdes0_edorr_mask = BIT(30);
 		priv->txdes0_edotr_mask = BIT(30);
 		priv->is_aspeed = true;
+		/* Disable ast2600 problematic HW arbitration */
+		if (of_device_is_compatible(np, "aspeed,ast2600-mac")) {
+			iowrite32(FTGMAC100_TM_DEFAULT,
+				  priv->base + FTGMAC100_OFFSET_TM);
+		}
 	} else {
 		priv->rxdes0_edorr_mask = BIT(15);
 		priv->txdes0_edotr_mask = BIT(15);
diff --git a/drivers/net/ethernet/faraday/ftgmac100.h b/drivers/net/ethernet/faraday/ftgmac100.h
index e5876a3fda91..63b3e02fab16 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.h
+++ b/drivers/net/ethernet/faraday/ftgmac100.h
@@ -169,6 +169,14 @@ 
 #define FTGMAC100_MACCR_FAST_MODE	(1 << 19)
 #define FTGMAC100_MACCR_SW_RST		(1 << 31)
 
+/*
+ * test mode control register
+ */
+#define FTGMAC100_TM_RQ_TX_VALID_DIS (1 << 28)
+#define FTGMAC100_TM_RQ_RR_IDLE_PREV (1 << 27)
+#define FTGMAC100_TM_DEFAULT                                                   \
+	(FTGMAC100_TM_RQ_TX_VALID_DIS | FTGMAC100_TM_RQ_RR_IDLE_PREV)
+
 /*
  * PHY control register
  */