Message ID | 20200921065744.491098-1-gch981213@gmail.com |
---|---|
State | Rejected |
Delegated to: | Chuanhong Guo |
Headers | show |
Series | ath79: ar8216: make switch register access atomic | expand |
On 2020-09-21 08:57, Chuanhong Guo wrote: > reg accesses on integrated ar8229 sometimes fails. As a result, phy read > got incorrect port status and wan link goes down and up mysteriously. > After comparing ar8216 with the old driver, these local_irq_save/restore > calls are the only meaningful differences I could find and it does fix > the issue. > The same changes were added in svn r26856 by Gabor Juhos: > ar71xx: ag71xx: make switch register access atomic > > As I can't find the underlying problem either, this hack is broght > back to fix the unstable link issue. > This hack is only suitable for ath79 mdio and may easily break the > driver on other platform. Limit it to ath79-only as a target patch. > > Fixes: FS#2216 > Fixes: FS#3226 > Signed-off-by: Chuanhong Guo <gch981213@gmail.com> > --- > .../930-ar8216-make-reg-access-atomic.patch | 61 +++++++++++++++++++ > 1 file changed, 61 insertions(+) > create mode 100644 target/linux/ath79/patches-5.4/930-ar8216-make-reg-access-atomic.patch > > diff --git a/target/linux/ath79/patches-5.4/930-ar8216-make-reg-access-atomic.patch b/target/linux/ath79/patches-5.4/930-ar8216-make-reg-access-atomic.patch > new file mode 100644 > index 0000000000..42f3305195 > --- /dev/null > +++ b/target/linux/ath79/patches-5.4/930-ar8216-make-reg-access-atomic.patch > @@ -0,0 +1,61 @@ > +From b3797d1a92afe97c173b00fdb7824cedba24eef0 Mon Sep 17 00:00:00 2001 > +From: Chuanhong Guo <gch981213@gmail.com> > +Date: Sun, 20 Sep 2020 01:00:45 +0800 > +Subject: [PATCH] ath79: ar8216: make switch register access atomic > + > +due to some unknown reason these register accesses sometimes fail > +on the integrated switch without this patch. > + > +THIS ONLY WORKS ON ATH79 AND MAY BREAK THE DRIVER ON OTHER PLATFORMS! > +The mdio bus on ath79 works in polling mode and doesn't rely on > +any interrupt. This patch breaks the driver on any mdio master > +with interrupts used. > + > +--- > +--- a/drivers/net/phy/ar8216.c > ++++ b/drivers/net/phy/ar8216.c > +@@ -253,12 +253,14 @@ ar8xxx_mii_write32(struct ar8xxx_priv *p > + u32 > + ar8xxx_read(struct ar8xxx_priv *priv, int reg) > + { > ++ unsigned long flags; > + struct mii_bus *bus = priv->mii_bus; > + u16 r1, r2, page; > + u32 val; > + > + split_addr((u32) reg, &r1, &r2, &page); > + > ++ local_irq_save(flags); > + mutex_lock(&bus->mdio_lock); Taking a mutex in an irq-disabled section is basically asking for a deadlock. What router is this issue reproduced on? What else is running at the time this happens? - Felix
Hi! On Mon, Sep 21, 2020 at 9:24 PM Felix Fietkau <nbd@nbd.name> wrote: > > + > > + split_addr((u32) reg, &r1, &r2, &page); > > + > > ++ local_irq_save(flags); > > + mutex_lock(&bus->mdio_lock); > Taking a mutex in an irq-disabled section is basically asking for a > deadlock. What router is this issue reproduced on? Any ar934x/qca953x router using built-in switch, based on Flyspray bug reports. I can reproduce it on ar9341/tp-link-wr841n-v8 As there's an ancient commit about the same issue I guess it appears on ar724x/ar933x as well. here's the commit message: commit 5d77f370d695c9a70f25ffb8367db64efadaaedd Author: Gabor Juhos <juhosg@openwrt.org> Date: Sun May 8 16:32:53 2011 +0000 ar71xx: ag71xx: make switch register access atomic Reading of the PHY registers occasionally returns with bogus values under heavy load. This misleads the PHY driver and thus causes false link/speed change notifications which leads to performance loss. [and some dmesg after this] > What else is running at the time this happens? I'm using a minimal wr841n-v8 image. (make menuconfig, select the device, save and build the firmware) no extra background tasks are running. swconfig led trigger is used so the kernel is constantly polling port status from switch in the background.
On 2020-09-21 18:26, Chuanhong Guo wrote: > Hi! > > On Mon, Sep 21, 2020 at 9:24 PM Felix Fietkau <nbd@nbd.name> wrote: >> > + >> > + split_addr((u32) reg, &r1, &r2, &page); >> > + >> > ++ local_irq_save(flags); >> > + mutex_lock(&bus->mdio_lock); >> Taking a mutex in an irq-disabled section is basically asking for a >> deadlock. What router is this issue reproduced on? > > Any ar934x/qca953x router using built-in switch, based on > Flyspray bug reports. I can reproduce it on ar9341/tp-link-wr841n-v8 > As there's an ancient commit about the same issue I guess > it appears on ar724x/ar933x as well. > here's the commit message: > commit 5d77f370d695c9a70f25ffb8367db64efadaaedd > Author: Gabor Juhos <juhosg@openwrt.org> > Date: Sun May 8 16:32:53 2011 +0000 > > ar71xx: ag71xx: make switch register access atomic > > Reading of the PHY registers occasionally returns with bogus values > under heavy load. This misleads the PHY driver and thus causes false > link/speed change notifications which leads to performance loss. > [and some dmesg after this] > >> What else is running at the time this happens? > > I'm using a minimal wr841n-v8 image. (make menuconfig, > select the device, save and build the firmware) > no extra background tasks are running. > swconfig led trigger is used so the kernel is constantly > polling port status from switch in the background. I have another idea what you could try. Many SoCs enable both MDIO busses, and maybe they have some shared resources/state in the hw. You could try making a global spinlock in ag71xx_mdio.c and do spin_lock_bh(&mdio_lock) in ag71xx_mdio_mii_read and ag71xx_mdio_mii_write - Felix
> -----Original Message----- > From: openwrt-devel [mailto:openwrt-devel-bounces@lists.openwrt.org] > On Behalf Of Chuanhong Guo > Sent: Montag, 21. September 2020 08:58 > To: openwrt-devel@openwrt.org > Cc: Chuanhong Guo <gch981213@gmail.com> > Subject: [PATCH] ath79: ar8216: make switch register access atomic > > reg accesses on integrated ar8229 sometimes fails. As a result, phy read got > incorrect port status and wan link goes down and up mysteriously. > After comparing ar8216 with the old driver, these local_irq_save/restore calls > are the only meaningful differences I could find and it does fix the issue. > The same changes were added in svn r26856 by Gabor Juhos: > ar71xx: ag71xx: make switch register access atomic > > As I can't find the underlying problem either, this hack is broght back to fix > the unstable link issue. > This hack is only suitable for ath79 mdio and may easily break the driver on > other platform. Limit it to ath79-only as a target patch. This patch fixes the problem for me on TL-WR841N v12 (qca9533), tested on top of 19.07.4. Tested-by: Adrian Schmutzler <freifunk@adrianschmutzler.de> Acked-by: Adrian Schmutzler <freifunk@adrianschmutzler.de> Best Adrian > > Fixes: FS#2216 > Fixes: FS#3226 > Signed-off-by: Chuanhong Guo <gch981213@gmail.com> > --- > .../930-ar8216-make-reg-access-atomic.patch | 61 +++++++++++++++++++ > 1 file changed, 61 insertions(+) > create mode 100644 target/linux/ath79/patches-5.4/930-ar8216-make-reg- > access-atomic.patch > > diff --git a/target/linux/ath79/patches-5.4/930-ar8216-make-reg-access- > atomic.patch b/target/linux/ath79/patches-5.4/930-ar8216-make-reg-access- > atomic.patch > new file mode 100644 > index 0000000000..42f3305195 > --- /dev/null > +++ b/target/linux/ath79/patches-5.4/930-ar8216-make-reg-access-atomic.p > +++ atch > @@ -0,0 +1,61 @@ > +From b3797d1a92afe97c173b00fdb7824cedba24eef0 Mon Sep 17 00:00:00 > 2001 > +From: Chuanhong Guo <gch981213@gmail.com> > +Date: Sun, 20 Sep 2020 01:00:45 +0800 > +Subject: [PATCH] ath79: ar8216: make switch register access atomic > + > +due to some unknown reason these register accesses sometimes fail on > +the integrated switch without this patch. > + > +THIS ONLY WORKS ON ATH79 AND MAY BREAK THE DRIVER ON OTHER > PLATFORMS! > +The mdio bus on ath79 works in polling mode and doesn't rely on any > +interrupt. This patch breaks the driver on any mdio master with > +interrupts used. > + > +--- > +--- a/drivers/net/phy/ar8216.c > ++++ b/drivers/net/phy/ar8216.c > +@@ -253,12 +253,14 @@ ar8xxx_mii_write32(struct ar8xxx_priv *p > + u32 > + ar8xxx_read(struct ar8xxx_priv *priv, int reg) { > ++ unsigned long flags; > + struct mii_bus *bus = priv->mii_bus; > + u16 r1, r2, page; > + u32 val; > + > + split_addr((u32) reg, &r1, &r2, &page); > + > ++ local_irq_save(flags); > + mutex_lock(&bus->mdio_lock); > + > + bus->write(bus, 0x18, 0, page); > +@@ -266,6 +268,7 @@ ar8xxx_read(struct ar8xxx_priv *priv, in > + val = ar8xxx_mii_read32(priv, 0x10 | r2, r1); > + > + mutex_unlock(&bus->mdio_lock); > ++ local_irq_restore(flags); > + > + return val; > + } > +@@ -273,11 +276,13 @@ ar8xxx_read(struct ar8xxx_priv *priv, in void > +ar8xxx_write(struct ar8xxx_priv *priv, int reg, u32 val) { > ++ unsigned long flags; > + struct mii_bus *bus = priv->mii_bus; > + u16 r1, r2, page; > + > + split_addr((u32) reg, &r1, &r2, &page); > + > ++ local_irq_save(flags); > + mutex_lock(&bus->mdio_lock); > + > + bus->write(bus, 0x18, 0, page); > +@@ -285,6 +290,7 @@ ar8xxx_write(struct ar8xxx_priv *priv, i > + ar8xxx_mii_write32(priv, 0x10 | r2, r1, val); > + > + mutex_unlock(&bus->mdio_lock); > ++ local_irq_restore(flags); > + } > + > + u32 > -- > 2.26.2 > > > _______________________________________________ > openwrt-devel mailing list > openwrt-devel@lists.openwrt.org > https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Hi! On Tue, Sep 22, 2020 at 12:37 AM Felix Fietkau <nbd@nbd.name> wrote: > You could try making a global spinlock in ag71xx_mdio.c and do > spin_lock_bh(&mdio_lock) in ag71xx_mdio_mii_read and ag71xx_mdio_mii_write I've just tried that and it doesn't work. Should I move local_irq_save below mutex_lock and send a v2 or do you have other suggestions?
Hi, Thanks a lot for fixing this! Can you backport it to openwrt-19.07? ath79 is also affected there. Regards, Baptiste On 21-09-20, Chuanhong Guo wrote: > reg accesses on integrated ar8229 sometimes fails. As a result, phy read > got incorrect port status and wan link goes down and up mysteriously. > After comparing ar8216 with the old driver, these local_irq_save/restore > calls are the only meaningful differences I could find and it does fix > the issue. > The same changes were added in svn r26856 by Gabor Juhos: > ar71xx: ag71xx: make switch register access atomic > > As I can't find the underlying problem either, this hack is broght > back to fix the unstable link issue. > This hack is only suitable for ath79 mdio and may easily break the > driver on other platform. Limit it to ath79-only as a target patch. > > Fixes: FS#2216 > Fixes: FS#3226 > Signed-off-by: Chuanhong Guo <gch981213@gmail.com> > --- > .../930-ar8216-make-reg-access-atomic.patch | 61 +++++++++++++++++++ > 1 file changed, 61 insertions(+) > create mode 100644 target/linux/ath79/patches-5.4/930-ar8216-make-reg-access-atomic.patch > > diff --git a/target/linux/ath79/patches-5.4/930-ar8216-make-reg-access-atomic.patch b/target/linux/ath79/patches-5.4/930-ar8216-make-reg-access-atomic.patch > new file mode 100644 > index 0000000000..42f3305195 > --- /dev/null > +++ b/target/linux/ath79/patches-5.4/930-ar8216-make-reg-access-atomic.patch > @@ -0,0 +1,61 @@ > +From b3797d1a92afe97c173b00fdb7824cedba24eef0 Mon Sep 17 00:00:00 2001 > +From: Chuanhong Guo <gch981213@gmail.com> > +Date: Sun, 20 Sep 2020 01:00:45 +0800 > +Subject: [PATCH] ath79: ar8216: make switch register access atomic > + > +due to some unknown reason these register accesses sometimes fail > +on the integrated switch without this patch. > + > +THIS ONLY WORKS ON ATH79 AND MAY BREAK THE DRIVER ON OTHER PLATFORMS! > +The mdio bus on ath79 works in polling mode and doesn't rely on > +any interrupt. This patch breaks the driver on any mdio master > +with interrupts used. > + > +--- > +--- a/drivers/net/phy/ar8216.c > ++++ b/drivers/net/phy/ar8216.c > +@@ -253,12 +253,14 @@ ar8xxx_mii_write32(struct ar8xxx_priv *p > + u32 > + ar8xxx_read(struct ar8xxx_priv *priv, int reg) > + { > ++ unsigned long flags; > + struct mii_bus *bus = priv->mii_bus; > + u16 r1, r2, page; > + u32 val; > + > + split_addr((u32) reg, &r1, &r2, &page); > + > ++ local_irq_save(flags); > + mutex_lock(&bus->mdio_lock); > + > + bus->write(bus, 0x18, 0, page); > +@@ -266,6 +268,7 @@ ar8xxx_read(struct ar8xxx_priv *priv, in > + val = ar8xxx_mii_read32(priv, 0x10 | r2, r1); > + > + mutex_unlock(&bus->mdio_lock); > ++ local_irq_restore(flags); > + > + return val; > + } > +@@ -273,11 +276,13 @@ ar8xxx_read(struct ar8xxx_priv *priv, in > + void > + ar8xxx_write(struct ar8xxx_priv *priv, int reg, u32 val) > + { > ++ unsigned long flags; > + struct mii_bus *bus = priv->mii_bus; > + u16 r1, r2, page; > + > + split_addr((u32) reg, &r1, &r2, &page); > + > ++ local_irq_save(flags); > + mutex_lock(&bus->mdio_lock); > + > + bus->write(bus, 0x18, 0, page); > +@@ -285,6 +290,7 @@ ar8xxx_write(struct ar8xxx_priv *priv, i > + ar8xxx_mii_write32(priv, 0x10 | r2, r1, val); > + > + mutex_unlock(&bus->mdio_lock); > ++ local_irq_restore(flags); > + } > + > + u32
Hi! On Sun, Oct 4, 2020 at 12:33 PM Baptiste Jonglez <baptiste@bitsofnetworks.org> wrote: > > Hi, > > Thanks a lot for fixing this! > > Can you backport it to openwrt-19.07? ath79 is also affected there. backport pushed.
diff --git a/target/linux/ath79/patches-5.4/930-ar8216-make-reg-access-atomic.patch b/target/linux/ath79/patches-5.4/930-ar8216-make-reg-access-atomic.patch new file mode 100644 index 0000000000..42f3305195 --- /dev/null +++ b/target/linux/ath79/patches-5.4/930-ar8216-make-reg-access-atomic.patch @@ -0,0 +1,61 @@ +From b3797d1a92afe97c173b00fdb7824cedba24eef0 Mon Sep 17 00:00:00 2001 +From: Chuanhong Guo <gch981213@gmail.com> +Date: Sun, 20 Sep 2020 01:00:45 +0800 +Subject: [PATCH] ath79: ar8216: make switch register access atomic + +due to some unknown reason these register accesses sometimes fail +on the integrated switch without this patch. + +THIS ONLY WORKS ON ATH79 AND MAY BREAK THE DRIVER ON OTHER PLATFORMS! +The mdio bus on ath79 works in polling mode and doesn't rely on +any interrupt. This patch breaks the driver on any mdio master +with interrupts used. + +--- +--- a/drivers/net/phy/ar8216.c ++++ b/drivers/net/phy/ar8216.c +@@ -253,12 +253,14 @@ ar8xxx_mii_write32(struct ar8xxx_priv *p + u32 + ar8xxx_read(struct ar8xxx_priv *priv, int reg) + { ++ unsigned long flags; + struct mii_bus *bus = priv->mii_bus; + u16 r1, r2, page; + u32 val; + + split_addr((u32) reg, &r1, &r2, &page); + ++ local_irq_save(flags); + mutex_lock(&bus->mdio_lock); + + bus->write(bus, 0x18, 0, page); +@@ -266,6 +268,7 @@ ar8xxx_read(struct ar8xxx_priv *priv, in + val = ar8xxx_mii_read32(priv, 0x10 | r2, r1); + + mutex_unlock(&bus->mdio_lock); ++ local_irq_restore(flags); + + return val; + } +@@ -273,11 +276,13 @@ ar8xxx_read(struct ar8xxx_priv *priv, in + void + ar8xxx_write(struct ar8xxx_priv *priv, int reg, u32 val) + { ++ unsigned long flags; + struct mii_bus *bus = priv->mii_bus; + u16 r1, r2, page; + + split_addr((u32) reg, &r1, &r2, &page); + ++ local_irq_save(flags); + mutex_lock(&bus->mdio_lock); + + bus->write(bus, 0x18, 0, page); +@@ -285,6 +290,7 @@ ar8xxx_write(struct ar8xxx_priv *priv, i + ar8xxx_mii_write32(priv, 0x10 | r2, r1, val); + + mutex_unlock(&bus->mdio_lock); ++ local_irq_restore(flags); + } + + u32
reg accesses on integrated ar8229 sometimes fails. As a result, phy read got incorrect port status and wan link goes down and up mysteriously. After comparing ar8216 with the old driver, these local_irq_save/restore calls are the only meaningful differences I could find and it does fix the issue. The same changes were added in svn r26856 by Gabor Juhos: ar71xx: ag71xx: make switch register access atomic As I can't find the underlying problem either, this hack is broght back to fix the unstable link issue. This hack is only suitable for ath79 mdio and may easily break the driver on other platform. Limit it to ath79-only as a target patch. Fixes: FS#2216 Fixes: FS#3226 Signed-off-by: Chuanhong Guo <gch981213@gmail.com> --- .../930-ar8216-make-reg-access-atomic.patch | 61 +++++++++++++++++++ 1 file changed, 61 insertions(+) create mode 100644 target/linux/ath79/patches-5.4/930-ar8216-make-reg-access-atomic.patch