Patchwork ACPIPHP: fix device destroying order issue in handling dock notification

login
register
mail settings
Submitter Jiang Liu
Date June 11, 2013, 3 p.m.
Message ID <51B73BAB.3030406@gmail.com>
Download mbox | patch
Permalink /patch/250561/
State Not Applicable
Headers show

Comments

Jiang Liu - June 11, 2013, 3 p.m.
Hi Alexander,
    This is much more harder issue to resolve.  Let's first work around 
this
issue and check whether other things are OK. The patch below is just a
prove of concept, could you please help to try it?
Regards!
Gerry

---
---
On Tue 11 Jun 2013 09:38:59 PM CST, Alexander E. Patrakov wrote:
> 2013/6/11 Jiang Liu <liuj97@gmail.com>:
>> Hi Alexander,
>>      Sorry for the deadlock, I have no machine for testing:(
>> Below patch should fix the deadlock issue.
>
> There is another deadlock:
>
> [   34.316382] acpiphp_glue: found ACPI PCI Hotplug slot 1 at PCI 0000:0a:00
> [   34.316557] acpiphp: Slot [1-1] registered
>
> [   34.316569] =============================================
> [   34.316570] [ INFO: possible recursive locking detected ]
> [   34.316573] 3.10.0-rc4 #6 Tainted: G         C
> [   34.316575] ---------------------------------------------
> [   34.316577] kworker/0:0/4 is trying to acquire lock:
> [   34.316579]  (&dock_station->hp_lock){+.+.+.}, at:
> [<ffffffff813c766b>] register_hotplug_dock_device+0x6a/0xbf
> [   34.316588]
> but task is already holding lock:
> [   34.316590]  (&dock_station->hp_lock){+.+.+.}, at:
> [<ffffffff813c7270>] hotplug_dock_devices+0x2c/0xda
> [   34.316595]
> other info that might help us debug this:
> [   34.316597]  Possible unsafe locking scenario:
>
> [   34.316599]        CPU0
> [   34.316601]        ----
> [   34.316602]   lock(&dock_station->hp_lock);
> [   34.316605]   lock(&dock_station->hp_lock);
> [   34.316608]
>  *** DEADLOCK ***
>
> [   34.316611]  May be due to missing lock nesting notation
>
> [   34.316613] 5 locks held by kworker/0:0/4:
> [   34.316615]  #0:  (kacpi_hotplug){.+.+.+}, at: [<ffffffff8105c1a7>]
> process_one_work+0x157/0x560
> [   34.316624]  #1:  ((&dpc->work)#3){+.+.+.}, at:
> [<ffffffff8105c1a7>] process_one_work+0x157/0x560
> [   34.316631]  #2:  (acpi_scan_lock){+.+.+.}, at:
> [<ffffffff813c38fb>] acpi_scan_lock_acquire+0x12/0x14
> [   34.316639]  #3:  (&dock_station->hp_lock){+.+.+.}, at:
> [<ffffffff813c7270>] hotplug_dock_devices+0x2c/0xda
> [   34.316646]  #4:  (&slot->crit_sect){+.+.+.}, at:
> [<ffffffff813a0e8e>] acpiphp_enable_slot+0x1e/0x140
> [   34.316653]
> stack backtrace:
> [   34.316657] CPU: 0 PID: 4 Comm: kworker/0:0 Tainted: G         C
> 3.10.0-rc4 #6
> [   34.316659] Hardware name: Sony Corporation VPCZ23A4R/VAIO, BIOS
> R1013H5 05/21/2012
> [   34.316663] Workqueue: kacpi_hotplug acpi_os_execute_deferred
> [   34.316665]  ffff8802540adf40 ffff8802540d3628 ffffffff8165aaf8
> ffff8802540d3718
> [   34.316670]  ffffffff8109fe92 ffff8802540adf40 ffffffff8261c8a0
> ffff8802540ae700
> [   34.316675]  0000000000000000 ffff8802540d3748 000000000001f180
> ffff8802000000dc
> [   34.316680] Call Trace:
> [   34.316685]  [<ffffffff8165aaf8>] dump_stack+0x19/0x1b
> [   34.316689]  [<ffffffff8109fe92>] __lock_acquire+0x1522/0x1ee0
> [   34.316693]  [<ffffffff810a1751>] ? mark_held_locks+0x61/0x150
> [   34.316697]  [<ffffffff81660cc5>] ? _raw_spin_unlock_irqrestore+0x65/0x80
> [   34.316702]  [<ffffffff813ddfbc>] ? acpi_ns_get_node+0xb2/0xc2
> [   34.316705]  [<ffffffff813c766b>] ? register_hotplug_dock_device+0x6a/0xbf
> [   34.316709]  [<ffffffff810a0e77>] lock_acquire+0x87/0x150
> [   34.316712]  [<ffffffff813c766b>] ? register_hotplug_dock_device+0x6a/0xbf
> [   34.316715]  [<ffffffff813c766b>] ? register_hotplug_dock_device+0x6a/0xbf
> [   34.316720]  [<ffffffff8165d87e>] mutex_lock_nested+0x5e/0x3e0
> [   34.316723]  [<ffffffff813c766b>] ? register_hotplug_dock_device+0x6a/0xbf
> [   34.316726]  [<ffffffff81660c30>] ? _raw_spin_unlock+0x30/0x60
> [   34.316729]  [<ffffffff813c766b>] register_hotplug_dock_device+0x6a/0xbf
> [   34.316733]  [<ffffffff813a0637>] register_slot+0x467/0x5b0
> [   34.316738]  [<ffffffff813de0c8>] acpi_ns_walk_namespace+0xbb/0x17b
> [   34.316741]  [<ffffffff813c06e3>] ? acpi_os_wait_semaphore+0x3f/0x55
> [   34.316744]  [<ffffffff813a01d0>] ? free_bridge+0x100/0x100
> [   34.316748]  [<ffffffff813a01d0>] ? free_bridge+0x100/0x100
> [   34.316752]  [<ffffffff813de846>] acpi_walk_namespace+0x8e/0xc8
> [   34.316755]  [<ffffffff813a0b0d>] acpiphp_enumerate_slots+0x1bd/0x320
> [   34.316760]  [<ffffffff81448836>] ? pm_runtime_init+0x106/0x110
> [   34.316764]  [<ffffffff813a5a0f>] acpi_pci_add_bus+0x2f/0x40
> [   34.316768]  [<ffffffff815332f9>] pcibios_add_bus+0x9/0x10
> [   34.316772]  [<ffffffff81643168>] pci_add_new_bus+0x1c8/0x390
> [   34.316777]  [<ffffffff81380075>] pci_scan_bridge+0x5e5/0x620
> [   34.316781]  [<ffffffff816444e9>] enable_device+0x169/0x450
> [   34.316785]  [<ffffffff813a0f3a>] acpiphp_enable_slot+0xca/0x140
> [   34.316789]  [<ffffffff813a13b6>] __handle_hotplug_event_func+0x96/0x1a0
> [   34.316792]  [<ffffffff813c729b>] hotplug_dock_devices+0x57/0xda
> [   34.316796]  [<ffffffff813c7b06>] acpi_dock_deferred_cb+0xd4/0x1c8
> [   34.316799]  [<ffffffff813bfba9>] acpi_os_execute_deferred+0x20/0x2d
> [   34.316803]  [<ffffffff8105c212>] process_one_work+0x1c2/0x560
> [   34.316807]  [<ffffffff8105c1a7>] ? process_one_work+0x157/0x560
> [   34.316810]  [<ffffffff8105d126>] worker_thread+0x116/0x370
> [   34.316813]  [<ffffffff8105d010>] ? manage_workers.isra.20+0x2d0/0x2d0
> [   34.316818]  [<ffffffff81063986>] kthread+0xd6/0xe0
> [   34.316821]  [<ffffffff81660d0b>] ? _raw_spin_unlock_irq+0x2b/0x60
> [   34.316826]  [<ffffffff810638b0>] ? __init_kthread_worker+0x70/0x70
> [   34.316830]  [<ffffffff816680ac>] ret_from_fork+0x7c/0xb0
> [   34.316834]  [<ffffffff810638b0>] ? __init_kthread_worker+0x70/0x70
>
>> Regards!
>>
>> ----
>> diff --git a/drivers/pci/hotplug/acpiphp_glue.c
>> b/drivers/pci/hotplug/acpiphp_glue.c
>> index 0302645..699b8ca 100644
>> --- a/drivers/pci/hotplug/acpiphp_glue.c
>> +++ b/drivers/pci/hotplug/acpiphp_glue.c
>> @@ -1075,8 +1075,6 @@ static void
>> _handle_hotplug_event_func(acpi_handle handle, u32 type,
>>         struct acpi_buffer buffer = { .length = sizeof(objname),
>>                                       .pointer = objname };
>>
>> -       acpi_scan_lock_acquire();
>> -
>>         acpi_get_name(handle, ACPI_FULL_PATHNAME, &buffer);
>>
>>         switch (type) {
>> @@ -1108,8 +1106,6 @@ static void
>> _handle_hotplug_event_func(acpi_handle handle, u32 type,
>>                 warn("notify_handler: unknown event type 0x%x for
>> %s\n", type, objname);
>>                 break;
>>         }
>> -
>> -       acpi_scan_lock_release();
>>  }
>>
>>  static void _handle_hotplug_event_cb(struct work_struct *work)
>> @@ -1119,8 +1115,10 @@ static void _handle_hotplug_event_cb(struct
>> work_struct *work)
>>
>>         hp_work = container_of(work, struct acpi_hp_work, work);
>>         func = (struct acpiphp_func *)hp_work->context;
>> +       acpi_scan_lock_acquire();
>>         _handle_hotplug_event_func(hp_work->handle, hp_work->type,
>>                                     hp_work->context);
>> +       acpi_scan_lock_release();
>>         kfree(hp_work); /* allocated in handle_hotplug_event_func */
>>         put_bridge(func->slot->bridge);
>>  }
>>
>
>
>
> --
> Alexander E. Patrakov


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiang Liu - June 12, 2013, 2:49 a.m.
On Wed 12 Jun 2013 12:51:59 AM CST, Alexander E. Patrakov wrote:
> 2013/6/11 Jiang Liu <liuj97@gmail.com>:
>> Hi Alexander,
>>     This is much more harder issue to resolve.  Let's first work around
>> this
>> issue and check whether other things are OK. The patch below is just a
>> prove of concept, could you please help to try it?
>
> In the initially-undocked case it passes the "dock and undock three
> times, verify lspci output at each step" test.
>
> In the initially-docked case, it exhibits the following problem: when
> I press the undock button, only one PCI device disappears, and the
> "docked" LED does not turn off. Additionally, there is a hung task.
Hi Alexander,
     In the initially-docked case, the failure is caused by an issue in
the intel sound card driver. Seems something is wrong with reference
count management and it never returns to zero on driver detach.
Could you please help to disable the Intel sound card driver and try
again?

I'm not familiar with Intel HDA driver,  so please help to fire another
bug for it.

Regards!
Gerry

>
> Both dmesgs are attached.
>


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander E. Patrakov - June 12, 2013, 3:44 a.m.
2013/6/12 Jiang Liu <liuj97@gmail.com>:
> On Wed 12 Jun 2013 12:51:59 AM CST, Alexander E. Patrakov wrote:
>> 2013/6/11 Jiang Liu <liuj97@gmail.com>:
>>> Hi Alexander,
>>>     This is much more harder issue to resolve.  Let's first work around
>>> this
>>> issue and check whether other things are OK. The patch below is just a
>>> prove of concept, could you please help to try it?
>>
>> In the initially-undocked case it passes the "dock and undock three
>> times, verify lspci output at each step" test.
>>
>> In the initially-docked case, it exhibits the following problem: when
>> I press the undock button, only one PCI device disappears, and the
>> "docked" LED does not turn off. Additionally, there is a hung task.
> Hi Alexander,
>      In the initially-docked case, the failure is caused by an issue in
> the intel sound card driver. Seems something is wrong with reference
> count management and it never returns to zero on driver detach.
> Could you please help to disable the Intel sound card driver and try
> again?
>
> I'm not familiar with Intel HDA driver,  so please help to fire another
> bug for it.

Thanks for pointing the finger to snd-hda-intel. With that driver
blacklisted, the lspci output matches the expectations even after
undocking the initially-docked laptop. Redocking re-adds the devices,
too. So the situation is almost as good as in the initially-undocked
case, you only have to deal with this:

[   64.312253] ata8.00: disabled
[   64.318462] cdrom: issuing MRW background format suspend
[   64.320288] INFO: trying to register non-static key.
[   64.320292] the code is fine but needs lockdep annotation.
[   64.320294] turning off the locking correctness validator.
[   64.320298] CPU: 0 PID: 40 Comm: kworker/0:1 Tainted: G         C
3.10.0-rc4 #7
[   64.320301] Hardware name: Sony Corporation VPCZ23A4R/VAIO, BIOS
R1013H5 05/21/2012
[   64.320306] Workqueue: kacpi_hotplug acpi_os_execute_deferred
[   64.320309]  ffff880253db0000 ffff880253db9688 ffffffff8165aab8
ffff880253db9778
[   64.320314]  ffffffff810a018e ffff880253db07c0 0000000000000000
ffff880200000000
[   64.320319]  0000000000000000 ffff880200000000 ffff880253db07e0
000000000000005a
[   64.320324] Call Trace:
[   64.320330]  [<ffffffff8165aab8>] dump_stack+0x19/0x1b
[   64.320335]  [<ffffffff810a018e>] __lock_acquire+0x181e/0x1ee0
[   64.320338]  [<ffffffff810a1751>] ? mark_held_locks+0x61/0x150
[   64.320341]  [<ffffffff810a1aae>] ? debug_check_no_locks_freed+0x8e/0x160
[   64.320347]  [<ffffffff8105b940>] ? queue_delayed_work_on+0xa0/0xa0
[   64.320350]  [<ffffffff810a0e77>] lock_acquire+0x87/0x150
[   64.320354]  [<ffffffff8105b940>] ? queue_delayed_work_on+0xa0/0xa0
[   64.320357]  [<ffffffff8109c430>] ? lockdep_init_map+0xb0/0x530
[   64.320361]  [<ffffffff8105b978>] flush_work+0x38/0x270
[   64.320365]  [<ffffffff8105b940>] ? queue_delayed_work_on+0xa0/0xa0
[   64.320368]  [<ffffffff810a1751>] ? mark_held_locks+0x61/0x150
[   64.320371]  [<ffffffff8105d645>] ? __cancel_work_timer+0xa5/0x110
[   64.320375]  [<ffffffff810a1945>] ? trace_hardirqs_on_caller+0x105/0x1d0
[   64.320378]  [<ffffffff8105d61a>] __cancel_work_timer+0x7a/0x110
[   64.320381]  [<ffffffff8105d6cb>] cancel_work_sync+0xb/0x10
[   64.320389]  [<ffffffffa00773ae>] rtl_remove_one+0x5e/0x140 [r8169]
[   64.320394]  [<ffffffff81385631>] pci_device_remove+0x41/0xc0
[   64.320399]  [<ffffffff8143bd47>] __device_release_driver+0x77/0xe0
[   64.320403]  [<ffffffff8143bdd9>] device_release_driver+0x29/0x40
[   64.320407]  [<ffffffff8143b7e1>] bus_remove_device+0xf1/0x140
[   64.320411]  [<ffffffff81438ecd>] device_del+0x11d/0x1b0
[   64.320416]  [<ffffffff8138062c>] pci_stop_bus_device+0x9c/0xb0
[   64.320420]  [<ffffffff813805cb>] pci_stop_bus_device+0x3b/0xb0
[   64.320423]  [<ffffffff813805cb>] pci_stop_bus_device+0x3b/0xb0
[   64.320427]  [<ffffffff813805cb>] pci_stop_bus_device+0x3b/0xb0
[   64.320431]  [<ffffffff813a0fe5>] ? acpiphp_disable_slot+0x35/0x140
[   64.320435]  [<ffffffff813805cb>] pci_stop_bus_device+0x3b/0xb0
[   64.320437]  [<ffffffff813805cb>] pci_stop_bus_device+0x3b/0xb0
[   64.320440]  [<ffffffff813805cb>] pci_stop_bus_device+0x3b/0xb0
[   64.320443]  [<ffffffff81380791>] pci_stop_and_remove_bus_device+0x11/0x20
[   64.320446]  [<ffffffff813a1036>] acpiphp_disable_slot+0x86/0x140
[   64.320450]  [<ffffffff813a13da>] __handle_hotplug_event_func+0xba/0x1a0
[   64.320454]  [<ffffffff813c729b>] hotplug_dock_devices+0x57/0xda
[   64.320458]  [<ffffffff813c79a1>] handle_eject_request+0xaf/0xdf
[   64.320461]  [<ffffffff813c7b6f>] acpi_dock_deferred_cb+0x163/0x1c8
[   64.320465]  [<ffffffff813bfba9>] acpi_os_execute_deferred+0x20/0x2d
[   64.320468]  [<ffffffff8105c212>] process_one_work+0x1c2/0x560
[   64.320472]  [<ffffffff8105c1a7>] ? process_one_work+0x157/0x560
[   64.320475]  [<ffffffff8105d126>] worker_thread+0x116/0x370
[   64.320479]  [<ffffffff8105d010>] ? manage_workers.isra.20+0x2d0/0x2d0
[   64.320483]  [<ffffffff81063986>] kthread+0xd6/0xe0
[   64.320487]  [<ffffffff810638b0>] ? __init_kthread_worker+0x70/0x70
[   64.320492]  [<ffffffff8166806c>] ret_from_fork+0x7c/0xb0
[   64.320496]  [<ffffffff810638b0>] ? __init_kthread_worker+0x70/0x70
[   64.334782] vgaarb: device changed decodes:
PCI:0000:00:02.0,olddecodes=none,decodes=io+mem:owns=none
[   64.337187] pci_bus 0000:0b: busn_res: [bus 0b] is released
[   64.337410] pci_bus 0000:0c: busn_res: [bus 0c] is released
[   64.337472] pci_bus 0000:16: busn_res: [bus 16] is released

As for snd-hda-intel bug, I will file it later today and let you know.
Alexander E. Patrakov - June 12, 2013, 5:05 p.m.
2013/6/12 Alexander E. Patrakov <patrakov@gmail.com>:
> 2013/6/12 Jiang Liu <liuj97@gmail.com>:
>> On Wed 12 Jun 2013 12:51:59 AM CST, Alexander E. Patrakov wrote:
>>> In the initially-docked case, it exhibits the following problem: when
>>> I press the undock button, only one PCI device disappears, and the
>>> "docked" LED does not turn off. Additionally, there is a hung task.
>> Hi Alexander,
>>      In the initially-docked case, the failure is caused by an issue in
>> the intel sound card driver. Seems something is wrong with reference
>> count management and it never returns to zero on driver detach.
>> Could you please help to disable the Intel sound card driver and try
>> again?
>>
>> I'm not familiar with Intel HDA driver,  so please help to fire another
>> bug for it.
>
> Thanks for pointing the finger to snd-hda-intel. With that driver
> blacklisted, the lspci output matches the expectations even after
> undocking the initially-docked laptop. Redocking re-adds the devices,
> too.

> As for snd-hda-intel bug, I will file it later today and let you know.

Actually, I debugged it further and want some input from you before
filing. Here is the new information: the incomplete undocking and hung
task does not appear if I test from a virtual console, not from an
XFCE session. The obvious difference is that in the XFCE session some
processes (the mixer applet and pulseaudio) keep the mixer device
always open.

I am not qualified enough to say what should happen if the user
presses the undock button while a program has a to-be-undocked device
open. Or, for that matter and for an analogy, if the user unplugs a
USB sound card while it is playing.

And in my case, this is going to be an issue not only because of the
snd-hda-intel driver. Xorg would sometimes keep the DRM node of the
Radeon card (that is in the dock station) open in addition to the
onboard Intel card.

--
Alexander E. Patrakov
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander E. Patrakov - June 12, 2013, 6:37 p.m.
2013/6/12 Jiang Liu <liuj97@gmail.com>:
> Hi Alexander,
>      In the initially-docked case, the failure is caused by an issue in
> the intel sound card driver. Seems something is wrong with reference
> count management and it never returns to zero on driver detach.
> Could you please help to disable the Intel sound card driver and try
> again?

I have discussed the problem with pulseaudio developers. According to
my understanding (that may be wrong), we have a classic ABBA deadlock.
The device will go away when its reference count goes to zero. The
reference count will go to zero when there are no open fds. The mixer
fd will be closed when it gets a -EIO, i.e. after the device goes
away.

If the above understanding is correct, I think that waiting for zero
references should be at least questioned.

Patch

diff --git a/drivers/acpi/dock.c b/drivers/acpi/dock.c
index a7ba608..dde1cec 100644
--- a/drivers/acpi/dock.c
+++ b/drivers/acpi/dock.c
@@ -121,9 +121,7 @@  static void
 dock_add_hotplug_device(struct dock_station *ds,
                        struct dock_dependent_device *dd)
 {
-       mutex_lock(&ds->hp_lock);
        list_add_tail(&dd->hotplug_list, &ds->hotplug_devices);
-       mutex_unlock(&ds->hp_lock);
 }

 /**
@@ -137,9 +135,7 @@  static void
 dock_del_hotplug_device(struct dock_station *ds,
                        struct dock_dependent_device *dd)
 {
-       mutex_lock(&ds->hp_lock);
        list_del_init(&dd->hotplug_list);
-       mutex_unlock(&ds->hp_lock);
 }

 /**