diff mbox series

ath79: ar8216: make switch register access atomic

Message ID 20200921065744.491098-1-gch981213@gmail.com
State Rejected
Delegated to: Chuanhong Guo
Headers show
Series ath79: ar8216: make switch register access atomic | expand

Commit Message

Chuanhong Guo Sept. 21, 2020, 6:57 a.m. UTC
reg accesses on integrated ar8229 sometimes fails. As a result, phy read
got incorrect port status and wan link goes down and up mysteriously.
After comparing ar8216 with the old driver, these local_irq_save/restore
calls are the only meaningful differences I could find and it does fix
the issue.
The same changes were added in svn r26856 by Gabor Juhos:
ar71xx: ag71xx: make switch register access atomic

As I can't find the underlying problem either, this hack is broght
back to fix the unstable link issue.
This hack is only suitable for ath79 mdio and may easily break the
driver on other platform. Limit it to ath79-only as a target patch.

Fixes: FS#2216
Fixes: FS#3226
Signed-off-by: Chuanhong Guo <gch981213@gmail.com>
---
 .../930-ar8216-make-reg-access-atomic.patch   | 61 +++++++++++++++++++
 1 file changed, 61 insertions(+)
 create mode 100644 target/linux/ath79/patches-5.4/930-ar8216-make-reg-access-atomic.patch

Comments

Felix Fietkau Sept. 21, 2020, 1:24 p.m. UTC | #1
On 2020-09-21 08:57, Chuanhong Guo wrote:
> reg accesses on integrated ar8229 sometimes fails. As a result, phy read
> got incorrect port status and wan link goes down and up mysteriously.
> After comparing ar8216 with the old driver, these local_irq_save/restore
> calls are the only meaningful differences I could find and it does fix
> the issue.
> The same changes were added in svn r26856 by Gabor Juhos:
> ar71xx: ag71xx: make switch register access atomic
> 
> As I can't find the underlying problem either, this hack is broght
> back to fix the unstable link issue.
> This hack is only suitable for ath79 mdio and may easily break the
> driver on other platform. Limit it to ath79-only as a target patch.
> 
> Fixes: FS#2216
> Fixes: FS#3226
> Signed-off-by: Chuanhong Guo <gch981213@gmail.com>
> ---
>  .../930-ar8216-make-reg-access-atomic.patch   | 61 +++++++++++++++++++
>  1 file changed, 61 insertions(+)
>  create mode 100644 target/linux/ath79/patches-5.4/930-ar8216-make-reg-access-atomic.patch
> 
> diff --git a/target/linux/ath79/patches-5.4/930-ar8216-make-reg-access-atomic.patch b/target/linux/ath79/patches-5.4/930-ar8216-make-reg-access-atomic.patch
> new file mode 100644
> index 0000000000..42f3305195
> --- /dev/null
> +++ b/target/linux/ath79/patches-5.4/930-ar8216-make-reg-access-atomic.patch
> @@ -0,0 +1,61 @@
> +From b3797d1a92afe97c173b00fdb7824cedba24eef0 Mon Sep 17 00:00:00 2001
> +From: Chuanhong Guo <gch981213@gmail.com>
> +Date: Sun, 20 Sep 2020 01:00:45 +0800
> +Subject: [PATCH] ath79: ar8216: make switch register access atomic
> +
> +due to some unknown reason these register accesses sometimes fail
> +on the integrated switch without this patch.
> +
> +THIS ONLY WORKS ON ATH79 AND MAY BREAK THE DRIVER ON OTHER PLATFORMS!
> +The mdio bus on ath79 works in polling mode and doesn't rely on
> +any interrupt. This patch breaks the driver on any mdio master
> +with interrupts used.
> +
> +---
> +--- a/drivers/net/phy/ar8216.c
> ++++ b/drivers/net/phy/ar8216.c
> +@@ -253,12 +253,14 @@ ar8xxx_mii_write32(struct ar8xxx_priv *p
> + u32
> + ar8xxx_read(struct ar8xxx_priv *priv, int reg)
> + {
> ++	unsigned long flags;
> + 	struct mii_bus *bus = priv->mii_bus;
> + 	u16 r1, r2, page;
> + 	u32 val;
> + 
> + 	split_addr((u32) reg, &r1, &r2, &page);
> + 
> ++	local_irq_save(flags);
> + 	mutex_lock(&bus->mdio_lock);
Taking a mutex in an irq-disabled section is basically asking for a
deadlock. What router is this issue reproduced on? What else is running
at the time this happens?

- Felix
Chuanhong Guo Sept. 21, 2020, 4:26 p.m. UTC | #2
Hi!

On Mon, Sep 21, 2020 at 9:24 PM Felix Fietkau <nbd@nbd.name> wrote:
> > +
> > +     split_addr((u32) reg, &r1, &r2, &page);
> > +
> > ++    local_irq_save(flags);
> > +     mutex_lock(&bus->mdio_lock);
> Taking a mutex in an irq-disabled section is basically asking for a
> deadlock. What router is this issue reproduced on?

Any ar934x/qca953x router using built-in switch, based on
Flyspray bug reports. I can reproduce it on ar9341/tp-link-wr841n-v8
As there's an ancient commit about the same issue I guess
it appears on ar724x/ar933x as well.
here's the commit message:
commit 5d77f370d695c9a70f25ffb8367db64efadaaedd
Author: Gabor Juhos <juhosg@openwrt.org>
Date:   Sun May 8 16:32:53 2011 +0000

    ar71xx: ag71xx: make switch register access atomic

    Reading of the PHY registers occasionally returns with bogus values
    under heavy load. This misleads the PHY driver and thus causes false
    link/speed change notifications which leads to performance loss.
[and some dmesg after this]

> What else is running at the time this happens?

I'm using a minimal wr841n-v8 image. (make menuconfig,
select the device, save and build the firmware)
no extra background tasks are running.
swconfig led trigger is used so the kernel is constantly
polling port status from switch in the background.
Felix Fietkau Sept. 21, 2020, 4:37 p.m. UTC | #3
On 2020-09-21 18:26, Chuanhong Guo wrote:
> Hi!
> 
> On Mon, Sep 21, 2020 at 9:24 PM Felix Fietkau <nbd@nbd.name> wrote:
>> > +
>> > +     split_addr((u32) reg, &r1, &r2, &page);
>> > +
>> > ++    local_irq_save(flags);
>> > +     mutex_lock(&bus->mdio_lock);
>> Taking a mutex in an irq-disabled section is basically asking for a
>> deadlock. What router is this issue reproduced on?
> 
> Any ar934x/qca953x router using built-in switch, based on
> Flyspray bug reports. I can reproduce it on ar9341/tp-link-wr841n-v8
> As there's an ancient commit about the same issue I guess
> it appears on ar724x/ar933x as well.
> here's the commit message:
> commit 5d77f370d695c9a70f25ffb8367db64efadaaedd
> Author: Gabor Juhos <juhosg@openwrt.org>
> Date:   Sun May 8 16:32:53 2011 +0000
> 
>     ar71xx: ag71xx: make switch register access atomic
> 
>     Reading of the PHY registers occasionally returns with bogus values
>     under heavy load. This misleads the PHY driver and thus causes false
>     link/speed change notifications which leads to performance loss.
> [and some dmesg after this]
> 
>> What else is running at the time this happens?
> 
> I'm using a minimal wr841n-v8 image. (make menuconfig,
> select the device, save and build the firmware)
> no extra background tasks are running.
> swconfig led trigger is used so the kernel is constantly
> polling port status from switch in the background.
I have another idea what you could try. Many SoCs enable both MDIO
busses, and maybe they have some shared resources/state in the hw.

You could try making a global spinlock in ag71xx_mdio.c and do
spin_lock_bh(&mdio_lock) in ag71xx_mdio_mii_read and ag71xx_mdio_mii_write

- Felix
Adrian Schmutzler Sept. 23, 2020, 9:53 a.m. UTC | #4
> -----Original Message-----
> From: openwrt-devel [mailto:openwrt-devel-bounces@lists.openwrt.org]
> On Behalf Of Chuanhong Guo
> Sent: Montag, 21. September 2020 08:58
> To: openwrt-devel@openwrt.org
> Cc: Chuanhong Guo <gch981213@gmail.com>
> Subject: [PATCH] ath79: ar8216: make switch register access atomic
> 
> reg accesses on integrated ar8229 sometimes fails. As a result, phy read got
> incorrect port status and wan link goes down and up mysteriously.
> After comparing ar8216 with the old driver, these local_irq_save/restore calls
> are the only meaningful differences I could find and it does fix the issue.
> The same changes were added in svn r26856 by Gabor Juhos:
> ar71xx: ag71xx: make switch register access atomic
> 
> As I can't find the underlying problem either, this hack is broght back to fix
> the unstable link issue.
> This hack is only suitable for ath79 mdio and may easily break the driver on
> other platform. Limit it to ath79-only as a target patch.

This patch fixes the problem for me on TL-WR841N v12 (qca9533), tested on top of 19.07.4.

Tested-by: Adrian Schmutzler <freifunk@adrianschmutzler.de>
Acked-by: Adrian Schmutzler <freifunk@adrianschmutzler.de>

Best

Adrian

> 
> Fixes: FS#2216
> Fixes: FS#3226
> Signed-off-by: Chuanhong Guo <gch981213@gmail.com>
> ---
>  .../930-ar8216-make-reg-access-atomic.patch   | 61 +++++++++++++++++++
>  1 file changed, 61 insertions(+)
>  create mode 100644 target/linux/ath79/patches-5.4/930-ar8216-make-reg-
> access-atomic.patch
> 
> diff --git a/target/linux/ath79/patches-5.4/930-ar8216-make-reg-access-
> atomic.patch b/target/linux/ath79/patches-5.4/930-ar8216-make-reg-access-
> atomic.patch
> new file mode 100644
> index 0000000000..42f3305195
> --- /dev/null
> +++ b/target/linux/ath79/patches-5.4/930-ar8216-make-reg-access-atomic.p
> +++ atch
> @@ -0,0 +1,61 @@
> +From b3797d1a92afe97c173b00fdb7824cedba24eef0 Mon Sep 17 00:00:00
> 2001
> +From: Chuanhong Guo <gch981213@gmail.com>
> +Date: Sun, 20 Sep 2020 01:00:45 +0800
> +Subject: [PATCH] ath79: ar8216: make switch register access atomic
> +
> +due to some unknown reason these register accesses sometimes fail on
> +the integrated switch without this patch.
> +
> +THIS ONLY WORKS ON ATH79 AND MAY BREAK THE DRIVER ON OTHER
> PLATFORMS!
> +The mdio bus on ath79 works in polling mode and doesn't rely on any
> +interrupt. This patch breaks the driver on any mdio master with
> +interrupts used.
> +
> +---
> +--- a/drivers/net/phy/ar8216.c
> ++++ b/drivers/net/phy/ar8216.c
> +@@ -253,12 +253,14 @@ ar8xxx_mii_write32(struct ar8xxx_priv *p
> + u32
> + ar8xxx_read(struct ar8xxx_priv *priv, int reg)  {
> ++	unsigned long flags;
> + 	struct mii_bus *bus = priv->mii_bus;
> + 	u16 r1, r2, page;
> + 	u32 val;
> +
> + 	split_addr((u32) reg, &r1, &r2, &page);
> +
> ++	local_irq_save(flags);
> + 	mutex_lock(&bus->mdio_lock);
> +
> + 	bus->write(bus, 0x18, 0, page);
> +@@ -266,6 +268,7 @@ ar8xxx_read(struct ar8xxx_priv *priv, in
> + 	val = ar8xxx_mii_read32(priv, 0x10 | r2, r1);
> +
> + 	mutex_unlock(&bus->mdio_lock);
> ++	local_irq_restore(flags);
> +
> + 	return val;
> + }
> +@@ -273,11 +276,13 @@ ar8xxx_read(struct ar8xxx_priv *priv, in  void
> +ar8xxx_write(struct ar8xxx_priv *priv, int reg, u32 val)  {
> ++	unsigned long flags;
> + 	struct mii_bus *bus = priv->mii_bus;
> + 	u16 r1, r2, page;
> +
> + 	split_addr((u32) reg, &r1, &r2, &page);
> +
> ++	local_irq_save(flags);
> + 	mutex_lock(&bus->mdio_lock);
> +
> + 	bus->write(bus, 0x18, 0, page);
> +@@ -285,6 +290,7 @@ ar8xxx_write(struct ar8xxx_priv *priv, i
> + 	ar8xxx_mii_write32(priv, 0x10 | r2, r1, val);
> +
> + 	mutex_unlock(&bus->mdio_lock);
> ++	local_irq_restore(flags);
> + }
> +
> + u32
> --
> 2.26.2
> 
> 
> _______________________________________________
> openwrt-devel mailing list
> openwrt-devel@lists.openwrt.org
> https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Chuanhong Guo Sept. 23, 2020, 3:50 p.m. UTC | #5
Hi!

On Tue, Sep 22, 2020 at 12:37 AM Felix Fietkau <nbd@nbd.name> wrote:
> You could try making a global spinlock in ag71xx_mdio.c and do
> spin_lock_bh(&mdio_lock) in ag71xx_mdio_mii_read and ag71xx_mdio_mii_write

I've just tried that and it doesn't work.
Should I move local_irq_save below mutex_lock and send a v2 or
do you have other suggestions?
Baptiste Jonglez Oct. 3, 2020, 11:01 p.m. UTC | #6
Hi,

Thanks a lot for fixing this!

Can you backport it to openwrt-19.07?  ath79 is also affected there.

Regards,
Baptiste

On 21-09-20, Chuanhong Guo wrote:
> reg accesses on integrated ar8229 sometimes fails. As a result, phy read
> got incorrect port status and wan link goes down and up mysteriously.
> After comparing ar8216 with the old driver, these local_irq_save/restore
> calls are the only meaningful differences I could find and it does fix
> the issue.
> The same changes were added in svn r26856 by Gabor Juhos:
> ar71xx: ag71xx: make switch register access atomic
> 
> As I can't find the underlying problem either, this hack is broght
> back to fix the unstable link issue.
> This hack is only suitable for ath79 mdio and may easily break the
> driver on other platform. Limit it to ath79-only as a target patch.
> 
> Fixes: FS#2216
> Fixes: FS#3226
> Signed-off-by: Chuanhong Guo <gch981213@gmail.com>
> ---
>  .../930-ar8216-make-reg-access-atomic.patch   | 61 +++++++++++++++++++
>  1 file changed, 61 insertions(+)
>  create mode 100644 target/linux/ath79/patches-5.4/930-ar8216-make-reg-access-atomic.patch
> 
> diff --git a/target/linux/ath79/patches-5.4/930-ar8216-make-reg-access-atomic.patch b/target/linux/ath79/patches-5.4/930-ar8216-make-reg-access-atomic.patch
> new file mode 100644
> index 0000000000..42f3305195
> --- /dev/null
> +++ b/target/linux/ath79/patches-5.4/930-ar8216-make-reg-access-atomic.patch
> @@ -0,0 +1,61 @@
> +From b3797d1a92afe97c173b00fdb7824cedba24eef0 Mon Sep 17 00:00:00 2001
> +From: Chuanhong Guo <gch981213@gmail.com>
> +Date: Sun, 20 Sep 2020 01:00:45 +0800
> +Subject: [PATCH] ath79: ar8216: make switch register access atomic
> +
> +due to some unknown reason these register accesses sometimes fail
> +on the integrated switch without this patch.
> +
> +THIS ONLY WORKS ON ATH79 AND MAY BREAK THE DRIVER ON OTHER PLATFORMS!
> +The mdio bus on ath79 works in polling mode and doesn't rely on
> +any interrupt. This patch breaks the driver on any mdio master
> +with interrupts used.
> +
> +---
> +--- a/drivers/net/phy/ar8216.c
> ++++ b/drivers/net/phy/ar8216.c
> +@@ -253,12 +253,14 @@ ar8xxx_mii_write32(struct ar8xxx_priv *p
> + u32
> + ar8xxx_read(struct ar8xxx_priv *priv, int reg)
> + {
> ++	unsigned long flags;
> + 	struct mii_bus *bus = priv->mii_bus;
> + 	u16 r1, r2, page;
> + 	u32 val;
> + 
> + 	split_addr((u32) reg, &r1, &r2, &page);
> + 
> ++	local_irq_save(flags);
> + 	mutex_lock(&bus->mdio_lock);
> + 
> + 	bus->write(bus, 0x18, 0, page);
> +@@ -266,6 +268,7 @@ ar8xxx_read(struct ar8xxx_priv *priv, in
> + 	val = ar8xxx_mii_read32(priv, 0x10 | r2, r1);
> + 
> + 	mutex_unlock(&bus->mdio_lock);
> ++	local_irq_restore(flags);
> + 
> + 	return val;
> + }
> +@@ -273,11 +276,13 @@ ar8xxx_read(struct ar8xxx_priv *priv, in
> + void
> + ar8xxx_write(struct ar8xxx_priv *priv, int reg, u32 val)
> + {
> ++	unsigned long flags;
> + 	struct mii_bus *bus = priv->mii_bus;
> + 	u16 r1, r2, page;
> + 
> + 	split_addr((u32) reg, &r1, &r2, &page);
> + 
> ++	local_irq_save(flags);
> + 	mutex_lock(&bus->mdio_lock);
> + 
> + 	bus->write(bus, 0x18, 0, page);
> +@@ -285,6 +290,7 @@ ar8xxx_write(struct ar8xxx_priv *priv, i
> + 	ar8xxx_mii_write32(priv, 0x10 | r2, r1, val);
> + 
> + 	mutex_unlock(&bus->mdio_lock);
> ++	local_irq_restore(flags);
> + }
> + 
> + u32
Chuanhong Guo Oct. 11, 2020, 4:57 a.m. UTC | #7
Hi!

On Sun, Oct 4, 2020 at 12:33 PM Baptiste Jonglez
<baptiste@bitsofnetworks.org> wrote:
>
> Hi,
>
> Thanks a lot for fixing this!
>
> Can you backport it to openwrt-19.07?  ath79 is also affected there.

backport pushed.
diff mbox series

Patch

diff --git a/target/linux/ath79/patches-5.4/930-ar8216-make-reg-access-atomic.patch b/target/linux/ath79/patches-5.4/930-ar8216-make-reg-access-atomic.patch
new file mode 100644
index 0000000000..42f3305195
--- /dev/null
+++ b/target/linux/ath79/patches-5.4/930-ar8216-make-reg-access-atomic.patch
@@ -0,0 +1,61 @@ 
+From b3797d1a92afe97c173b00fdb7824cedba24eef0 Mon Sep 17 00:00:00 2001
+From: Chuanhong Guo <gch981213@gmail.com>
+Date: Sun, 20 Sep 2020 01:00:45 +0800
+Subject: [PATCH] ath79: ar8216: make switch register access atomic
+
+due to some unknown reason these register accesses sometimes fail
+on the integrated switch without this patch.
+
+THIS ONLY WORKS ON ATH79 AND MAY BREAK THE DRIVER ON OTHER PLATFORMS!
+The mdio bus on ath79 works in polling mode and doesn't rely on
+any interrupt. This patch breaks the driver on any mdio master
+with interrupts used.
+
+---
+--- a/drivers/net/phy/ar8216.c
++++ b/drivers/net/phy/ar8216.c
+@@ -253,12 +253,14 @@ ar8xxx_mii_write32(struct ar8xxx_priv *p
+ u32
+ ar8xxx_read(struct ar8xxx_priv *priv, int reg)
+ {
++	unsigned long flags;
+ 	struct mii_bus *bus = priv->mii_bus;
+ 	u16 r1, r2, page;
+ 	u32 val;
+ 
+ 	split_addr((u32) reg, &r1, &r2, &page);
+ 
++	local_irq_save(flags);
+ 	mutex_lock(&bus->mdio_lock);
+ 
+ 	bus->write(bus, 0x18, 0, page);
+@@ -266,6 +268,7 @@ ar8xxx_read(struct ar8xxx_priv *priv, in
+ 	val = ar8xxx_mii_read32(priv, 0x10 | r2, r1);
+ 
+ 	mutex_unlock(&bus->mdio_lock);
++	local_irq_restore(flags);
+ 
+ 	return val;
+ }
+@@ -273,11 +276,13 @@ ar8xxx_read(struct ar8xxx_priv *priv, in
+ void
+ ar8xxx_write(struct ar8xxx_priv *priv, int reg, u32 val)
+ {
++	unsigned long flags;
+ 	struct mii_bus *bus = priv->mii_bus;
+ 	u16 r1, r2, page;
+ 
+ 	split_addr((u32) reg, &r1, &r2, &page);
+ 
++	local_irq_save(flags);
+ 	mutex_lock(&bus->mdio_lock);
+ 
+ 	bus->write(bus, 0x18, 0, page);
+@@ -285,6 +290,7 @@ ar8xxx_write(struct ar8xxx_priv *priv, i
+ 	ar8xxx_mii_write32(priv, 0x10 | r2, r1, val);
+ 
+ 	mutex_unlock(&bus->mdio_lock);
++	local_irq_restore(flags);
+ }
+ 
+ u32