Message ID | E1gvPHw-0008OD-To@rmk-PC.armlinux.org.uk |
---|---|
State | Changes Requested |
Delegated to: | David Miller |
Headers | show |
Series | [net] net: dsa: fix lockdep warning | expand |
On Sun, Feb 17, 2019 at 04:27:32PM +0000, Russell King wrote: > ====================================================== > WARNING: possible circular locking dependency detected > 4.20.0+ #302 Not tainted > ------------------------------------------------------ Hi Russell Thanks for turning this into a proper patch. I had just started to try to reproduce this and confirm your fix. I will add a tested-by once i do. Thanks Andrew
On Sun, Feb 17, 2019 at 05:46:29PM +0100, Andrew Lunn wrote: > On Sun, Feb 17, 2019 at 04:27:32PM +0000, Russell King wrote: > > ====================================================== > > WARNING: possible circular locking dependency detected > > 4.20.0+ #302 Not tainted > > ------------------------------------------------------ > > Hi Russell > > Thanks for turning this into a proper patch. I had just started to try > to reproduce this and confirm your fix. I will add a tested-by once i > do. Do you have a clearfog board? If so, just add: + interrupt-parent = <&gpio0>; + interrupts = <23 IRQ_TYPE_LEVEL_LOW>; to the DSA switch definitions in arch/arm/boot/dts/armada-388-clearfog.dts
On Sun, Feb 17, 2019 at 04:51:40PM +0000, Russell King - ARM Linux admin wrote: > On Sun, Feb 17, 2019 at 05:46:29PM +0100, Andrew Lunn wrote: > > On Sun, Feb 17, 2019 at 04:27:32PM +0000, Russell King wrote: > > > ====================================================== > > > WARNING: possible circular locking dependency detected > > > 4.20.0+ #302 Not tainted > > > ------------------------------------------------------ > > > > Hi Russell > > > > Thanks for turning this into a proper patch. I had just started to try > > to reproduce this and confirm your fix. I will add a tested-by once i > > do. > > Do you have a clearfog board? If so, just add: > > + interrupt-parent = <&gpio0>; > + interrupts = <23 IRQ_TYPE_LEVEL_LOW>; > > to the DSA switch definitions in > arch/arm/boot/dts/armada-388-clearfog.dts Hi Russell I'm using a different board, but i have a similar interrupt configuration. I just expect that turning on LOCKDEP should be enough. I don't think EPROBE_DEFFER is playing a part here. Andrew
Hi Russell, On 2/17/2019 8:27 AM, Russell King wrote: > ====================================================== > WARNING: possible circular locking dependency detected > 4.20.0+ #302 Not tainted > ------------------------------------------------------ > systemd-udevd/160 is trying to acquire lock: > edea6080 (&chip->reg_lock){+.+.}, at: __setup_irq+0x640/0x704 > > but task is already holding lock: > edff0340 (&desc->request_mutex){+.+.}, at: __setup_irq+0xa0/0x704 > > which lock already depends on the new lock. Since this is specific to mv88e6xxx, in case you resubmit, can you put that in the commit subject as well? Thanks!
On Sun, Feb 17, 2019 at 06:00:24PM +0100, Andrew Lunn wrote: > On Sun, Feb 17, 2019 at 04:51:40PM +0000, Russell King - ARM Linux admin wrote: > > On Sun, Feb 17, 2019 at 05:46:29PM +0100, Andrew Lunn wrote: > > > On Sun, Feb 17, 2019 at 04:27:32PM +0000, Russell King wrote: > > > > ====================================================== > > > > WARNING: possible circular locking dependency detected > > > > 4.20.0+ #302 Not tainted > > > > ------------------------------------------------------ > > > > > > Hi Russell > > > > > > Thanks for turning this into a proper patch. I had just started to try > > > to reproduce this and confirm your fix. I will add a tested-by once i > > > do. > > > > Do you have a clearfog board? If so, just add: > > > > + interrupt-parent = <&gpio0>; > > + interrupts = <23 IRQ_TYPE_LEVEL_LOW>; > > > > to the DSA switch definitions in > > arch/arm/boot/dts/armada-388-clearfog.dts > > Hi Russell > > I'm using a different board, but i have a similar interrupt > configuration. I just expect that turning on LOCKDEP should be enough. > I don't think EPROBE_DEFFER is playing a part here. Yep, lockdep should be enough. In case it does matter, I have DSA built as modules here.
On Sun, Feb 17, 2019 at 04:27:32PM +0000, Russell King wrote: > ====================================================== > WARNING: possible circular locking dependency detected > 4.20.0+ #302 Not tainted > ------------------------------------------------------ > systemd-udevd/160 is trying to acquire lock: > edea6080 (&chip->reg_lock){+.+.}, at: __setup_irq+0x640/0x704 > > but task is already holding lock: > edff0340 (&desc->request_mutex){+.+.}, at: __setup_irq+0xa0/0x704 > > which lock already depends on the new lock. > > the existing dependency chain (in reverse order) is: > > -> #1 (&desc->request_mutex){+.+.}: > mutex_lock_nested+0x1c/0x24 > __setup_irq+0xa0/0x704 > request_threaded_irq+0xd0/0x150 > mv88e6xxx_probe+0x41c/0x694 [mv88e6xxx] > -> #0 (&chip->reg_lock){+.+.}: > __mutex_lock+0x50/0x8b8 > mutex_lock_nested+0x1c/0x24 > __setup_irq+0x640/0x704 > request_threaded_irq+0xd0/0x150 > mv88e6xxx_g2_irq_setup+0xcc/0x1b4 [mv88e6xxx] > mv88e6xxx_probe+0x44c/0x694 [mv88e6xxx] > mdio_probe+0x2c/0x54 > > other info that might help us debug this: > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > lock(&desc->request_mutex); > lock(&chip->reg_lock); > lock(&desc->request_mutex); > lock(&chip->reg_lock); Hi Russell I failed to reproduce it on a Freescale system. Which made me take a closer look at the above. This is a false positive. In #1 we are requesting the GPIO interrupt. In #2 we are requesting the chained interrupt from the mv88e6xxx global 1 interrupt handler. So these are different desc->request_mutex. The Freescale VF610 GPIO driver uses gpiochip_irqchip_add(), which creates a lock class for the GPIO. The marvell gpio-mvebu driver does not create a lock class. So when i test on Freescale, lockdep can tell they are different mutex, but on clearfog it cannot. So i think the real fix is probably two fold, although just doing one is sufficient: 1) Add lock classes to gpio-mvebu, by call irq_set_lockdep_class() 2) Add lock classes to chip.c global 1, by calling irq_set_lockdep_class() There is probably more value in 1) since the mvebu gpio driver is much more widely used than the mv88e6xxx driver. Andrew
On Sun, Feb 17, 2019 at 06:55:39PM +0100, Andrew Lunn wrote: > On Sun, Feb 17, 2019 at 04:27:32PM +0000, Russell King wrote: > > ====================================================== > > WARNING: possible circular locking dependency detected > > 4.20.0+ #302 Not tainted > > ------------------------------------------------------ > > systemd-udevd/160 is trying to acquire lock: > > edea6080 (&chip->reg_lock){+.+.}, at: __setup_irq+0x640/0x704 > > > > but task is already holding lock: > > edff0340 (&desc->request_mutex){+.+.}, at: __setup_irq+0xa0/0x704 > > > > which lock already depends on the new lock. > > > > the existing dependency chain (in reverse order) is: > > > > -> #1 (&desc->request_mutex){+.+.}: > > mutex_lock_nested+0x1c/0x24 > > __setup_irq+0xa0/0x704 > > request_threaded_irq+0xd0/0x150 > > mv88e6xxx_probe+0x41c/0x694 [mv88e6xxx] > > > -> #0 (&chip->reg_lock){+.+.}: > > __mutex_lock+0x50/0x8b8 > > mutex_lock_nested+0x1c/0x24 > > __setup_irq+0x640/0x704 > > request_threaded_irq+0xd0/0x150 > > mv88e6xxx_g2_irq_setup+0xcc/0x1b4 [mv88e6xxx] > > mv88e6xxx_probe+0x44c/0x694 [mv88e6xxx] > > mdio_probe+0x2c/0x54 > > > > other info that might help us debug this: > > > > Possible unsafe locking scenario: > > > > CPU0 CPU1 > > ---- ---- > > lock(&desc->request_mutex); > > lock(&chip->reg_lock); > > lock(&desc->request_mutex); > > lock(&chip->reg_lock); > > Hi Russell > > I failed to reproduce it on a Freescale system. Which made me take a > closer look at the above. This is a false positive. > > In #1 we are requesting the GPIO interrupt. In #2 we are requesting > the chained interrupt from the mv88e6xxx global 1 interrupt handler. > So these are different desc->request_mutex. The Freescale VF610 GPIO > driver uses gpiochip_irqchip_add(), which creates a lock class for the > GPIO. The marvell gpio-mvebu driver does not create a lock class. So > when i test on Freescale, lockdep can tell they are different mutex, > but on clearfog it cannot. > > So i think the real fix is probably two fold, although just doing one > is sufficient: > > 1) Add lock classes to gpio-mvebu, by call irq_set_lockdep_class() > 2) Add lock classes to chip.c global 1, by calling irq_set_lockdep_class() > > There is probably more value in 1) since the mvebu gpio driver is much > more widely used than the mv88e6xxx driver. I'd ask one question: is there any reason to hold chip->reg_lock while calling request_threaded_irq()? If not, then surely it would be best not to do so?
====================================================== WARNING: possible circular locking dependency detected 4.20.0+ #302 Not tainted ------------------------------------------------------ systemd-udevd/160 is trying to acquire lock: edea6080 (&chip->reg_lock){+.+.}, at: __setup_irq+0x640/0x704 but task is already holding lock: edff0340 (&desc->request_mutex){+.+.}, at: __setup_irq+0xa0/0x704 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&desc->request_mutex){+.+.}: mutex_lock_nested+0x1c/0x24 __setup_irq+0xa0/0x704 request_threaded_irq+0xd0/0x150 mv88e6xxx_probe+0x41c/0x694 [mv88e6xxx] mdio_probe+0x2c/0x54 really_probe+0x200/0x2c4 driver_probe_device+0x5c/0x174 __driver_attach+0xd8/0xdc bus_for_each_dev+0x58/0x7c bus_add_driver+0xe4/0x1f0 driver_register+0x7c/0x110 mdio_driver_register+0x24/0x58 do_one_initcall+0x74/0x2e8 do_init_module+0x60/0x1d0 load_module+0x1968/0x1ff4 sys_finit_module+0x8c/0x98 ret_fast_syscall+0x0/0x28 0xbedf2ae8 -> #0 (&chip->reg_lock){+.+.}: __mutex_lock+0x50/0x8b8 mutex_lock_nested+0x1c/0x24 __setup_irq+0x640/0x704 request_threaded_irq+0xd0/0x150 mv88e6xxx_g2_irq_setup+0xcc/0x1b4 [mv88e6xxx] mv88e6xxx_probe+0x44c/0x694 [mv88e6xxx] mdio_probe+0x2c/0x54 really_probe+0x200/0x2c4 driver_probe_device+0x5c/0x174 __driver_attach+0xd8/0xdc bus_for_each_dev+0x58/0x7c bus_add_driver+0xe4/0x1f0 driver_register+0x7c/0x110 mdio_driver_register+0x24/0x58 do_one_initcall+0x74/0x2e8 do_init_module+0x60/0x1d0 load_module+0x1968/0x1ff4 sys_finit_module+0x8c/0x98 ret_fast_syscall+0x0/0x28 0xbedf2ae8 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&desc->request_mutex); lock(&chip->reg_lock); lock(&desc->request_mutex); lock(&chip->reg_lock); *** DEADLOCK *** 2 locks held by systemd-udevd/160: #0: ee040868 (&dev->mutex){....}, at: __driver_attach+0x70/0xdc #1: edff0340 (&desc->request_mutex){+.+.}, at: __setup_irq+0xa0/0x704 stack backtrace: CPU: 1 PID: 160 Comm: systemd-udevd Not tainted 4.20.0+ #302 Hardware name: Marvell Armada 380/385 (Device Tree) [<c0019638>] (unwind_backtrace) from [<c0014888>] (show_stack+0x10/0x14) [<c0014888>] (show_stack) from [<c07f54e0>] (dump_stack+0x9c/0xd4) [<c07f54e0>] (dump_stack) from [<c0088afc>] (print_circular_bug+0x284/0x2d8) [<c0088afc>] (print_circular_bug) from [<c0086b5c>] (__lock_acquire+0x15d4/0x19b8) [<c0086b5c>] (__lock_acquire) from [<c0087828>] (lock_acquire+0xc4/0x1dc) [<c0087828>] (lock_acquire) from [<c080fd88>] (__mutex_lock+0x50/0x8b8) [<c080fd88>] (__mutex_lock) from [<c0810678>] (mutex_lock_nested+0x1c/0x24) [<c0810678>] (mutex_lock_nested) from [<c009e060>] (__setup_irq+0x640/0x704) [<c009e060>] (__setup_irq) from [<c009e2e0>] (request_threaded_irq+0xd0/0x150) [<c009e2e0>] (request_threaded_irq) from [<bf0ce978>] (mv88e6xxx_g2_irq_setup+0xcc/0x1b4 [mv88e6xxx]) [<bf0ce978>] (mv88e6xxx_g2_irq_setup [mv88e6xxx]) from [<bf0c7ab0>] (mv88e6xxx_probe+0x44c/0x694 [mv88e6xxx]) [<bf0c7ab0>] (mv88e6xxx_probe [mv88e6xxx]) from [<c050d420>] (mdio_probe+0x2c/0x54) [<c050d420>] (mdio_probe) from [<c0496eac>] (really_probe+0x200/0x2c4) [<c0496eac>] (really_probe) from [<c0497140>] (driver_probe_device+0x5c/0x174) [<c0497140>] (driver_probe_device) from [<c0497330>] (__driver_attach+0xd8/0xdc) [<c0497330>] (__driver_attach) from [<c0495494>] (bus_for_each_dev+0x58/0x7c) [<c0495494>] (bus_for_each_dev) from [<c04963d4>] (bus_add_driver+0xe4/0x1f0) [<c04963d4>] (bus_add_driver) from [<c0498038>] (driver_register+0x7c/0x110) [<c0498038>] (driver_register) from [<c050d338>] (mdio_driver_register+0x24/0x58) [<c050d338>] (mdio_driver_register) from [<c000afdc>] (do_one_initcall+0x74/0x2e8) [<c000afdc>] (do_one_initcall) from [<c00d4994>] (do_init_module+0x60/0x1d0) [<c00d4994>] (do_init_module) from [<c00d39e0>] (load_module+0x1968/0x1ff4) mvneta f1034000.ethernet eth2: requesting inband/2500base-x, 00200,0000a440 [<c00d39e0>] (load_module) from [<c00d4248>] (sys_finit_module+0x8c/0x98) [<c00d4248>] (sys_finit_module) from [<c0009000>] (ret_fast_syscall+0x0/0x28) Exception stack(0xedfe5fa8 to 0xedfe5ff0) 5fa0: 00020000 00000000 0000000b b6f2a4b5 00000000 00b8fc70 5fc0: 00020000 00000000 00000000 0000017b 00b995a0 00020000 00000000 00b8fc70 5fe0: bedf2af8 bedf2ae8 b6f242ac b6e83d70 This is caused by the locking order inversion in mv88e6xxx_probe: mutex_lock(&chip->reg_lock); if (chip->irq > 0) err = mv88e6xxx_g1_irq_setup(chip); else err = mv88e6xxx_irq_poll_setup(chip); mutex_unlock(&chip->reg_lock); Here, we take chip->reg_lock, and then call into mv88e6xxx_g1_irq_setup() which then calls request_threaded_irq(), taking the request_mutex. However, when we request the g2 interrupt, we call request_threaded_irq() again, which takes the request_mutex, which then goes on to call chip_bus_lock(). This comes through to mv88e6xxx_g1_irq_bus_lock, which then tries to grab chip->reg_lock. This results in the two locks being taken together in differing orders, provoking lockdep to warn. Move the mutex_lock()/unlock() for reg_lock inside mv88e6xxx_g1_irq_free_common() and mv88e6xxx_g1_irq_setup_common(), where we actually access registers, thereby avoiding holding it while calling request_threaded_irq() or setting up the delayed work. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> --- drivers/net/dsa/mv88e6xxx/chip.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index 24fb6a685039..801442195a04 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -349,9 +349,11 @@ static void mv88e6xxx_g1_irq_free_common(struct mv88e6xxx_chip *chip) int irq, virq; u16 mask; + mutex_lock(&chip->reg_lock); mv88e6xxx_g1_read(chip, MV88E6XXX_G1_CTL1, &mask); mask &= ~GENMASK(chip->g1_irq.nirqs, 0); mv88e6xxx_g1_write(chip, MV88E6XXX_G1_CTL1, mask); + mutex_unlock(&chip->reg_lock); for (irq = 0; irq < chip->g1_irq.nirqs; irq++) { virq = irq_find_mapping(chip->g1_irq.domain, irq); @@ -369,9 +371,7 @@ static void mv88e6xxx_g1_irq_free(struct mv88e6xxx_chip *chip) */ free_irq(chip->irq, chip); - mutex_lock(&chip->reg_lock); mv88e6xxx_g1_irq_free_common(chip); - mutex_unlock(&chip->reg_lock); } static int mv88e6xxx_g1_irq_setup_common(struct mv88e6xxx_chip *chip) @@ -392,6 +392,7 @@ static int mv88e6xxx_g1_irq_setup_common(struct mv88e6xxx_chip *chip) chip->g1_irq.chip = mv88e6xxx_g1_irq_chip; chip->g1_irq.masked = ~0; + mutex_lock(&chip->reg_lock); err = mv88e6xxx_g1_read(chip, MV88E6XXX_G1_CTL1, &mask); if (err) goto out_mapping; @@ -406,6 +407,7 @@ static int mv88e6xxx_g1_irq_setup_common(struct mv88e6xxx_chip *chip) err = mv88e6xxx_g1_read(chip, MV88E6XXX_G1_STS, ®); if (err) goto out_disable; + mutex_unlock(&chip->reg_lock); return 0; @@ -414,6 +416,7 @@ static int mv88e6xxx_g1_irq_setup_common(struct mv88e6xxx_chip *chip) mv88e6xxx_g1_write(chip, MV88E6XXX_G1_CTL1, mask); out_mapping: + mutex_unlock(&chip->reg_lock); for (irq = 0; irq < 16; irq++) { virq = irq_find_mapping(chip->g1_irq.domain, irq); irq_dispose_mapping(virq); @@ -479,9 +482,7 @@ static void mv88e6xxx_irq_poll_free(struct mv88e6xxx_chip *chip) kthread_cancel_delayed_work_sync(&chip->irq_poll_work); kthread_destroy_worker(chip->kworker); - mutex_lock(&chip->reg_lock); mv88e6xxx_g1_irq_free_common(chip); - mutex_unlock(&chip->reg_lock); } int mv88e6xxx_wait(struct mv88e6xxx_chip *chip, int addr, int reg, u16 mask) @@ -4718,12 +4719,10 @@ static int mv88e6xxx_probe(struct mdio_device *mdiodev) * the PHYs will link their interrupts to these interrupt * controllers */ - mutex_lock(&chip->reg_lock); if (chip->irq > 0) err = mv88e6xxx_g1_irq_setup(chip); else err = mv88e6xxx_irq_poll_setup(chip); - mutex_unlock(&chip->reg_lock); if (err) goto out;