Patchwork Re: kernfs/rtc: circular dependency between kernfs and ops_lock

login
register
mail settings
Submitter Alessandro Zummo
Date March 31, 2014, 9:46 a.m.
Message ID <20140331114627.5e1a4609@linux.lan.towertech.it>
Download mbox | patch
Permalink /patch/335235/
State New
Headers show

Comments

Alessandro Zummo - March 31, 2014, 9:46 a.m.
On Sun, 30 Mar 2014 12:04:16 -0400
Sasha Levin <sasha.levin@oracle.com> wrote:

> > Look good, thanks!  
> 
> Or not...
> 
> Hit it again during overnight fuzzing:
> 

 I think this is a different bug, please try this.
Lars-Peter Clausen - March 31, 2014, 9:52 a.m.
On 03/31/2014 11:46 AM, Alessandro Zummo wrote:
> On Sun, 30 Mar 2014 12:04:16 -0400
> Sasha Levin <sasha.levin@oracle.com> wrote:
>
>>> Look good, thanks!
>>
>> Or not...
>>
>> Hit it again during overnight fuzzing:
>>
>
>   I think this is a different bug, please try this.

It's the same bug, device_unregister(&rtc->dev) will try to unregister all 
sysfs files attached to the device and that function is still called with 
the rtc mutex held.

- Lars
Alessandro Zummo - March 31, 2014, 10:43 a.m.
On Mon, 31 Mar 2014 11:52:36 +0200
Lars-Peter Clausen <lars@metafoo.de> wrote:

> It's the same bug, device_unregister(&rtc->dev) will try to unregister all 
> sysfs files attached to the device and that function is still called with 
> the rtc mutex held.

 rtc-cmos tries to unregister the rtc device in the probe function,
 which should not be done. 

 the last patches fixes it. I'll keep trying to trigger the bug on my systems.
Lars-Peter Clausen - March 31, 2014, 11:07 a.m.
On 03/31/2014 12:43 PM, Alessandro Zummo wrote:
> On Mon, 31 Mar 2014 11:52:36 +0200
> Lars-Peter Clausen <lars@metafoo.de> wrote:
>
>> It's the same bug, device_unregister(&rtc->dev) will try to unregister all
>> sysfs files attached to the device and that function is still called with
>> the rtc mutex held.
>
>   rtc-cmos tries to unregister the rtc device in the probe function,
>   which should not be done.
>
>   the last patches fixes it. I'll keep trying to trigger the bug on my systems.
>

It doesn't really matter where it is unregistered. device_unregister() will 
(somewhere down it's callchain) take the kernfs lock, hence it must be 
callled with the rtc mutex being held.

- Lars
Alessandro Zummo - March 31, 2014, 12:03 p.m.
On Mon, 31 Mar 2014 13:07:10 +0200
Lars-Peter Clausen <lars@metafoo.de> wrote:

> It doesn't really matter where it is unregistered. device_unregister() will 
> (somewhere down it's callchain) take the kernfs lock, hence it must be 
> callled with the rtc mutex being held.

 Maybe device_remove_attrs could be called in the rtc base class,
 before the device removal?
Lars-Peter Clausen - March 31, 2014, 12:19 p.m.
On 03/31/2014 02:03 PM, Alessandro Zummo wrote:
> On Mon, 31 Mar 2014 13:07:10 +0200
> Lars-Peter Clausen <lars@metafoo.de> wrote:
>
>> It doesn't really matter where it is unregistered. device_unregister() will
>> (somewhere down it's callchain) take the kernfs lock, hence it must be
>> callled with the rtc mutex being held.
>
>   Maybe device_remove_attrs could be called in the rtc base class,
>   before the device removal?
>

Just move the device_unregister() call outside the lock. I think the only 
thing that needs to be protected is the ops = NULL assignment. Moving the 
unregister after the unlock also means that the extra 
get_device()/put_device() pair can be removed.
Sasha Levin - April 2, 2014, 10:51 p.m.
On 03/31/2014 08:19 AM, Lars-Peter Clausen wrote:
> On 03/31/2014 02:03 PM, Alessandro Zummo wrote:
>> On Mon, 31 Mar 2014 13:07:10 +0200
>> Lars-Peter Clausen <lars@metafoo.de> wrote:
>>
>>> It doesn't really matter where it is unregistered. device_unregister() will
>>> (somewhere down it's callchain) take the kernfs lock, hence it must be
>>> callled with the rtc mutex being held.
>>
>>   Maybe device_remove_attrs could be called in the rtc base class,
>>   before the device removal?
>>
> 
> Just move the device_unregister() call outside the lock. I think the only thing that needs to be protected is the ops = NULL assignment. Moving the unregister after the unlock also means that the extra get_device()/put_device() pair can be removed.

That seems to cause errors on boot:

[   23.714976] BUG: unable to handle kernel NULL pointer dereference at 0000000000000090
[   23.716017] IP: [<ffffffffb438ee5f>] kernfs_find_ns+0x1f/0x160
[   23.716620] PGD 0
[   23.716843] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[   23.717448] Dumping ftrace buffer:
[   23.717861]    (ftrace buffer empty)
[   23.718233] Modules linked in:
[   23.718561] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 3.14.0-next-20140402-sasha-00013-g0cfaf7e-dirty #364
[   23.719523] task: ffff8806ec5b0000 ti: ffff88003e284000 task.ti: ffff88003e284000
[   23.720134] RIP: 0010:[<ffffffffb438ee5f>]  [<ffffffffb438ee5f>] kernfs_find_ns+0x1f/0x160
[   23.720134] RSP: 0000:ffff88003e285af8  EFLAGS: 00010292
[   23.720134] RAX: 0000000080000000 RBX: 0000000000000000 RCX: 0000000000000006
[   23.720134] RDX: 0000000000000000 RSI: ffffffffb78bd6c0 RDI: 0000000000000000
[   23.720134] RBP: ffff88003e285b28 R08: 0000000000000000 R09: 0000000000000000
[   23.720134] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[   23.720134] R13: ffffffffb78bd6c0 R14: 0000000000000000 R15: 0000000000000008
[   23.720134] FS:  0000000000000000(0000) GS:ffff8800bec00000(0000) knlGS:0000000000000000
[   23.720134] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   23.720134] CR2: 0000000000000090 CR3: 0000000038e2c000 CR4: 00000000000006a0
[   23.720134] Stack:
[   23.720134]  ffff88003e285b28 0000000000000000 ffffffffb78bd6c0 0000000000000000
[   23.720134]  0000000000000080 0000000000000008 ffff88003e285b58 ffffffffb438f0be
[   23.720134]  ffff880000000010 ffffffffb9c0f6c0 ffff8800bdd89148 ffffffffb8e430b0
[   23.720134] Call Trace:
[   23.720134]  [<ffffffffb438f0be>] kernfs_find_and_get_ns+0x3e/0x70
[   23.730419] kobject: 'tlan' (ffff88026bf2a760): kobject_cleanup, parent ffff8803ec71efe8
[   23.730422] kobject: 'tlan' (ffff88026bf2a760): auto cleanup 'remove' event
[   23.730424] kobject: 'tlan' (ffff88026bf2a760): kobject_uevent_env
[   23.730460] kobject: 'tlan' (ffff88026bf2a760): fill_kobj_path: path = '/bus/pci/drivers/tlan'
[   23.730559] kobject: 'tlan' (ffff88026bf2a760): auto cleanup kobject_del
[   23.720134]  [<ffffffffb439392d>] sysfs_unmerge_group+0x1d/0x70
[   23.720134]  [<ffffffffb50d858b>] dpm_sysfs_remove+0x2b/0x70
[   23.720134]  [<ffffffffb50cdd97>] device_del+0x47/0x1c0
[   23.720134]  [<ffffffffb50cdf68>] device_unregister+0x58/0x70
[   23.720134]  [<ffffffffb636bb95>] rtc_device_unregister+0x65/0x80
[   23.720134]  [<ffffffffb63720a3>] cmos_do_probe+0x3b3/0x400
[   23.720134]  [<ffffffffba35ddd3>] cmos_platform_probe+0x44/0x4d
[   23.720134]  [<ffffffffb50d3862>] platform_drv_probe+0x32/0x90
[   23.720134]  [<ffffffffb50d1ba5>] driver_probe_device+0x175/0x370
[   23.720134]  [<ffffffffb50d1e24>] __driver_attach+0x84/0xc0
[   23.720134]  [<ffffffffb50d1da0>] ? driver_probe_device+0x370/0x370
[   23.720134]  [<ffffffffb50cfa39>] bus_for_each_dev+0x69/0xb0
[   23.720134]  [<ffffffffb50d13de>] driver_attach+0x1e/0x20
[   23.720134]  [<ffffffffb50d1018>] bus_add_driver+0x138/0x260
[   23.720134]  [<ffffffffba35dcf0>] ? bq4802_driver_init+0x14/0x14
[   23.720134]  [<ffffffffb50d2608>] driver_register+0x98/0xe0
[   23.720134]  [<ffffffffba35dcf0>] ? bq4802_driver_init+0x14/0x14
[   23.720134]  [<ffffffffb50d36da>] __platform_driver_register+0x4a/0x50
[   23.720134]  [<ffffffffb50d3704>] platform_driver_probe+0x24/0xc0
[   23.720134]  [<ffffffffba35dcf0>] ? bq4802_driver_init+0x14/0x14
[   23.720134]  [<ffffffffba35dd2c>] cmos_init+0x3c/0x74
[   23.720134]  [<ffffffffb400216a>] do_one_initcall+0xca/0x1d0
[   23.720134]  [<ffffffffb4185100>] ? parse_args+0x280/0x460
[   23.720134]  [<ffffffffba2b8128>] kernel_init_freeable+0x1d5/0x26c
[   23.720134]  [<ffffffffba2b7848>] ? loglevel+0x31/0x31
[   23.720134]  [<ffffffffb74a1fe0>] ? rest_init+0x140/0x140
[   23.720134]  [<ffffffffb74a1fee>] kernel_init+0xe/0x100
[   23.720134]  [<ffffffffb7514b7c>] ret_from_fork+0x7c/0xb0
[   23.720134]  [<ffffffffb74a1fe0>] ? rest_init+0x140/0x140
[   23.720134] Code: 8b 75 f8 c9 c3 0f 1f 80 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 57 41 56 49 89 d6 41 55 49 89 f5 41 54 49 89 fc 53 48 83 ec 08 <0f> b7 87 90 00 00 00 8b 15 d4 00 7d 05 48 8b 5f 78 66 c1 e8 05
[   23.720134] RIP  [<ffffffffb438ee5f>] kernfs_find_ns+0x1f/0x160
[   23.720134]  RSP <ffff88003e285af8>
[   23.720134] CR2: 0000000000000090


Thanks,
Sasha

Patch

diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c
index cae212f..2c77d8e 100644
--- a/drivers/rtc/rtc-cmos.c
+++ b/drivers/rtc/rtc-cmos.c
@@ -712,6 +712,20 @@  cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq)
 		}
 	}
 
+	spin_lock_irq(&rtc_lock);
+	rtc_control = CMOS_READ(RTC_CONTROL);
+	spin_unlock_irq(&rtc_lock);
+
+	/* FIXME:
+	 * <asm-generic/rtc.h> doesn't know 12-hour mode either.
+	 */
+	if (is_valid_irq(rtc_irq) && !(rtc_control & RTC_24H)) {
+		dev_warn(dev, "only 24-hr supported\n");
+		retval = -ENXIO;
+		goto cleanup0;
+	}
+
+
 	cmos_rtc.dev = dev;
 	dev_set_drvdata(dev, &cmos_rtc);
 
@@ -739,49 +753,49 @@  cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq)
 	/* disable irqs */
 	cmos_irq_disable(&cmos_rtc, RTC_PIE | RTC_AIE | RTC_UIE);
 
-	rtc_control = CMOS_READ(RTC_CONTROL);
-
 	spin_unlock_irq(&rtc_lock);
 
-	/* FIXME:
-	 * <asm-generic/rtc.h> doesn't know 12-hour mode either.
-	 */
-	if (is_valid_irq(rtc_irq) && !(rtc_control & RTC_24H)) {
-		dev_warn(dev, "only 24-hr supported\n");
-		retval = -ENXIO;
-		goto cleanup1;
-	}
-
 	if (is_valid_irq(rtc_irq)) {
-		irq_handler_t rtc_cmos_int_handler;
+
+		irq_handler_t rtc_cmos_int_handler = NULL;
 
 		if (is_hpet_enabled()) {
-			rtc_cmos_int_handler = hpet_rtc_interrupt;
+
 			retval = hpet_register_irq_handler(cmos_interrupt);
 			if (retval) {
 				dev_warn(dev, "hpet_register_irq_handler "
 						" failed in rtc_init().");
-				goto cleanup1;
+			} else {
+				rtc_cmos_int_handler = hpet_rtc_interrupt;
 			}
-		} else
+		} else {
 			rtc_cmos_int_handler = cmos_interrupt;
+		}
 
-		retval = request_irq(rtc_irq, rtc_cmos_int_handler,
-				0, dev_name(&cmos_rtc.rtc->dev),
-				cmos_rtc.rtc);
-		if (retval < 0) {
-			dev_dbg(dev, "IRQ %d is already in use\n", rtc_irq);
-			goto cleanup1;
+		if (rtc_cmos_int_handler) {
+			retval = request_irq(rtc_irq, rtc_cmos_int_handler,
+					0, dev_name(&cmos_rtc.rtc->dev),
+					cmos_rtc.rtc);
+			if (retval < 0) {
+
+				dev_err(dev, "IRQ %d is already in use\n", rtc_irq);
+
+				cmos_rtc.irq = -1;
+
+				if (is_hpet_enabled()) {
+					hpet_unregister_irq_handler(cmos_interrupt);
+				}
+			}
 		}
 	}
+
 	hpet_rtc_timer_init();
 
 	/* export at least the first block of NVRAM */
 	nvram.size = address_space - NVRAM_OFFSET;
 	retval = sysfs_create_bin_file(&dev->kobj, &nvram);
 	if (retval < 0) {
-		dev_dbg(dev, "can't create nvram file? %d\n", retval);
-		goto cleanup2;
+		dev_err(dev, "can't create nvram file? %d\n", retval);
 	}
 
 	dev_info(dev, "%s%s, %zd bytes nvram%s\n",
@@ -795,12 +809,6 @@  cmos_do_probe(struct device *dev, struct resource *ports, int rtc_irq)
 
 	return 0;
 
-cleanup2:
-	if (is_valid_irq(rtc_irq))
-		free_irq(rtc_irq, cmos_rtc.rtc);
-cleanup1:
-	cmos_rtc.dev = NULL;
-	rtc_device_unregister(cmos_rtc.rtc);
 cleanup0:
 	release_region(ports->start, resource_size(ports));
 	return retval;
@@ -823,8 +831,12 @@  static void __exit cmos_do_remove(struct device *dev)
 	sysfs_remove_bin_file(&dev->kobj, &nvram);
 
 	if (is_valid_irq(cmos->irq)) {
+
 		free_irq(cmos->irq, cmos->rtc);
-		hpet_unregister_irq_handler(cmos_interrupt);
+
+		if (is_hpet_enabled()) {
+			hpet_unregister_irq_handler(cmos_interrupt);
+		}
 	}
 
 	rtc_device_unregister(cmos->rtc);