diff mbox

PCI: Move device_del() from pci_stop_dev() to pci_destroy_dev()

Message ID 53283503.VcSP9RLiFc@vostro.rjw.lan
State Accepted
Headers show

Commit Message

Rafael J. Wysocki Nov. 24, 2013, 12:17 a.m. UTC
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

After commit bcdde7e221a8 (sysfs: make __sysfs_remove_dir() recursive)
I'm seeing traces analogous to the one below in Thunderbolt testing:

WARNING: CPU: 3 PID: 76 at /scratch/rafael/work/linux-pm/fs/sysfs/group.c:214 sysfs_remove_group+0x59/0xe0()
 sysfs group ffffffff81c6c500 not found for kobject '0000:08'
 Modules linked in: fuse hidp af_packet xt_tcpudp xt_pkttype xt_LOG xt_limit ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT iptable_raw xt_CT iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack rfcomm ip6table_filter bnep ip6_tables x_tables arc4 ath9k mac80211 x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul iTCO_wdt crc32c_intel iTCO_vendor_support ghash_clmulni_intel aesni_intel ablk_helper acer_wmi sparse_keymap ath9k_common ath9k_hw cryptd lrw gf128mul ath3k glue_helper aes_x86_64 btusb microcode ath pcspkr joydev uvcvideo serio_raw videobuf2_core i2c_i801 videodev snd_hda_codec_hdmi cfg80211 videobuf2_vmalloc tg3 videobuf2_memops sg ptp pps_core lpc_ich mfd_core snd_hda_codec_realtek bluetooth hid_logitech_dj rfkill snd_hda_intel snd_hda_codec shpchp battery ac wmi acpi_cpufreq edd snd_usb_audio snd_pcm snd_page_alloc snd_hwdep snd_usbmidi_lib 
snd_rawmidi snd_seq snd_timer snd_seq_device snd soundcore autofs4 xhci_hcd processor scsi_dh_hp_sw scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_dh
 CPU: 3 PID: 76 Comm: kworker/u16:7 Not tainted 3.13.0-rc1+ #76
 Hardware name: Acer Aspire S5-391/Venus    , BIOS V1.02 05/29/2012
 Workqueue: kacpi_hotplug acpi_hotplug_work_fn
  0000000000000009 ffff8801644b9ac8 ffffffff816b23bf 0000000000000007
  ffff8801644b9b18 ffff8801644b9b08 ffffffff81046607 ffff88016925b800
  0000000000000000 ffffffff81c6c500 ffff88016924f928 ffff88016924f800
 Call Trace:
  [<ffffffff816b23bf>] dump_stack+0x4e/0x71
  [<ffffffff81046607>] warn_slowpath_common+0x87/0xb0
  [<ffffffff810466d1>] warn_slowpath_fmt+0x41/0x50
  [<ffffffff811e42ef>] ? sysfs_get_dirent_ns+0x6f/0x80
  [<ffffffff811e5389>] sysfs_remove_group+0x59/0xe0
  [<ffffffff8149f00b>] dpm_sysfs_remove+0x3b/0x50
  [<ffffffff81495818>] device_del+0x58/0x1c0
  [<ffffffff814959c8>] device_unregister+0x48/0x60
  [<ffffffff813254fe>] pci_remove_bus+0x6e/0x80
  [<ffffffff81325548>] pci_remove_bus_device+0x38/0x110
  [<ffffffff8132555d>] pci_remove_bus_device+0x4d/0x110
  [<ffffffff81325639>] pci_stop_and_remove_bus_device+0x19/0x20
  [<ffffffff813418d0>] disable_slot+0x20/0xe0
  [<ffffffff81341a38>] acpiphp_check_bridge+0xa8/0xd0
  [<ffffffff813427ad>] hotplug_event+0x17d/0x220
  [<ffffffff81342880>] hotplug_event_work+0x30/0x70
  [<ffffffff8136d665>] acpi_hotplug_work_fn+0x18/0x24
  [<ffffffff81061331>] process_one_work+0x261/0x450
  [<ffffffff81061a7e>] worker_thread+0x21e/0x370
  [<ffffffff81061860>] ? rescuer_thread+0x300/0x300
  [<ffffffff81068342>] kthread+0xd2/0xe0
  [<ffffffff81068270>] ? flush_kthread_worker+0x70/0x70
  [<ffffffff816c19bc>] ret_from_fork+0x7c/0xb0
  [<ffffffff81068270>] ? flush_kthread_worker+0x70/0x70

(Mika Westerberg sees them too in his tests).

Some investigation documented in kernel bug #65281 lead me to the
conclusion that the source of the problem is the device_del() in
pci_stop_dev() as it now causes the sysfs directory of the device
to be removed recursively along with all of its subdirectories.
That includes the sysfs directory of the device's subordinate
bus (dev->subordinate) and its "power" group.

Consequently, when pci_remove_bus() is called for dev->subordinate
in pci_remove_bus_device(), it calls device_unregister(&bus->dev),
but at this point the sysfs directory of bus->dev doesn't exist any
more and its "power" group doesn't exist either.  Thus, when
dpm_sysfs_remove() called from device_del() tries to remove that
group, it triggers the above warning.

That indicates a logical mistake in the design of
pci_stop_and_remove_bus_device(), which causes bus device objects
to be left behind their parents (bridge device objects) and can be
fixed by moving the device_del() from pci_stop_dev() into
pci_destroy_dev(), so pci_remove_bus() can be called for the
device's subordinate bus before the device itself is unregistered
from the hierarchy.  Still, the driver, if any, should be detached
from the device in pci_stop_dev(), so use device_release_driver()
directly from there.

References: https://bugzilla.kernel.org/show_bug.cgi?id=65281#c6
Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/pci/remove.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Yinghai Lu Nov. 25, 2013, 4:54 a.m. UTC | #1
On Sat, Nov 23, 2013 at 4:17 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> After commit bcdde7e221a8 (sysfs: make __sysfs_remove_dir() recursive)
> I'm seeing traces analogous to the one below in Thunderbolt testing:
>
> WARNING: CPU: 3 PID: 76 at /scratch/rafael/work/linux-pm/fs/sysfs/group.c:214 sysfs_remove_group+0x59/0xe0()
>  sysfs group ffffffff81c6c500 not found for kobject '0000:08'
>  Modules linked in: fuse hidp af_packet xt_tcpudp xt_pkttype xt_LOG xt_limit ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT iptable_raw xt_CT iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack rfcomm ip6table_filter bnep ip6_tables x_tables arc4 ath9k mac80211 x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul iTCO_wdt crc32c_intel iTCO_vendor_support ghash_clmulni_intel aesni_intel ablk_helper acer_wmi sparse_keymap ath9k_common ath9k_hw cryptd lrw gf128mul ath3k glue_helper aes_x86_64 btusb microcode ath pcspkr joydev uvcvideo serio_raw videobuf2_core i2c_i801 videodev snd_hda_codec_hdmi cfg80211 videobuf2_vmalloc tg3 videobuf2_memops sg ptp pps_core lpc_ich mfd_core snd_hda_codec_realtek bluetooth hid_logitech_dj rfkill snd_hda_intel snd_hda_codec shpchp battery ac wmi acpi_cpufreq edd snd_usb_audio snd_pcm snd_page_alloc snd_hwdep snd_usbmidi_lib
> snd_rawmidi snd_seq snd_timer snd_seq_device snd soundcore autofs4 xhci_hcd processor scsi_dh_hp_sw scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_dh
>  CPU: 3 PID: 76 Comm: kworker/u16:7 Not tainted 3.13.0-rc1+ #76
>  Hardware name: Acer Aspire S5-391/Venus    , BIOS V1.02 05/29/2012
>  Workqueue: kacpi_hotplug acpi_hotplug_work_fn
>   0000000000000009 ffff8801644b9ac8 ffffffff816b23bf 0000000000000007
>   ffff8801644b9b18 ffff8801644b9b08 ffffffff81046607 ffff88016925b800
>   0000000000000000 ffffffff81c6c500 ffff88016924f928 ffff88016924f800
>  Call Trace:
>   [<ffffffff816b23bf>] dump_stack+0x4e/0x71
>   [<ffffffff81046607>] warn_slowpath_common+0x87/0xb0
>   [<ffffffff810466d1>] warn_slowpath_fmt+0x41/0x50
>   [<ffffffff811e42ef>] ? sysfs_get_dirent_ns+0x6f/0x80
>   [<ffffffff811e5389>] sysfs_remove_group+0x59/0xe0
>   [<ffffffff8149f00b>] dpm_sysfs_remove+0x3b/0x50
>   [<ffffffff81495818>] device_del+0x58/0x1c0
>   [<ffffffff814959c8>] device_unregister+0x48/0x60
>   [<ffffffff813254fe>] pci_remove_bus+0x6e/0x80
>   [<ffffffff81325548>] pci_remove_bus_device+0x38/0x110
>   [<ffffffff8132555d>] pci_remove_bus_device+0x4d/0x110
>   [<ffffffff81325639>] pci_stop_and_remove_bus_device+0x19/0x20
>   [<ffffffff813418d0>] disable_slot+0x20/0xe0
>   [<ffffffff81341a38>] acpiphp_check_bridge+0xa8/0xd0
>   [<ffffffff813427ad>] hotplug_event+0x17d/0x220
>   [<ffffffff81342880>] hotplug_event_work+0x30/0x70
>   [<ffffffff8136d665>] acpi_hotplug_work_fn+0x18/0x24
>   [<ffffffff81061331>] process_one_work+0x261/0x450
>   [<ffffffff81061a7e>] worker_thread+0x21e/0x370
>   [<ffffffff81061860>] ? rescuer_thread+0x300/0x300
>   [<ffffffff81068342>] kthread+0xd2/0xe0
>   [<ffffffff81068270>] ? flush_kthread_worker+0x70/0x70
>   [<ffffffff816c19bc>] ret_from_fork+0x7c/0xb0
>   [<ffffffff81068270>] ? flush_kthread_worker+0x70/0x70
>
> (Mika Westerberg sees them too in his tests).
>
> Some investigation documented in kernel bug #65281 lead me to the
> conclusion that the source of the problem is the device_del() in
> pci_stop_dev() as it now causes the sysfs directory of the device
> to be removed recursively along with all of its subdirectories.
> That includes the sysfs directory of the device's subordinate
> bus (dev->subordinate) and its "power" group.
>
> Consequently, when pci_remove_bus() is called for dev->subordinate
> in pci_remove_bus_device(), it calls device_unregister(&bus->dev),
> but at this point the sysfs directory of bus->dev doesn't exist any
> more and its "power" group doesn't exist either.  Thus, when
> dpm_sysfs_remove() called from device_del() tries to remove that
> group, it triggers the above warning.
>
> That indicates a logical mistake in the design of
> pci_stop_and_remove_bus_device(), which causes bus device objects
> to be left behind their parents (bridge device objects) and can be
> fixed by moving the device_del() from pci_stop_dev() into
> pci_destroy_dev(), so pci_remove_bus() can be called for the
> device's subordinate bus before the device itself is unregistered
> from the hierarchy.  Still, the driver, if any, should be detached
> from the device in pci_stop_dev(), so use device_release_driver()
> directly from there.
>
> References: https://bugzilla.kernel.org/show_bug.cgi?id=65281#c6
> Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  drivers/pci/remove.c |    4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> Index: linux-pm/drivers/pci/remove.c
> ===================================================================
> --- linux-pm.orig/drivers/pci/remove.c
> +++ linux-pm/drivers/pci/remove.c
> @@ -24,7 +24,7 @@ static void pci_stop_dev(struct pci_dev
>         if (dev->is_added) {
>                 pci_proc_detach_device(dev);
>                 pci_remove_sysfs_dev_files(dev);
> -               device_del(&dev->dev);
> +               device_release_driver(&dev->dev);
>                 dev->is_added = 0;
>         }
>
> @@ -34,6 +34,8 @@ static void pci_stop_dev(struct pci_dev
>
>  static void pci_destroy_dev(struct pci_dev *dev)
>  {
> +       device_del(&dev->dev);
> +
>         down_write(&pci_bus_sem);
>         list_del(&dev->bus_list);
>         up_write(&pci_bus_sem);
>

Maybe that is not enough. could still need add is_removed ...

Please check

https://lkml.org/lkml/2013/5/13/653
PATCH 3/7] PCI: Detach driver in pci_stop_device
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu Nov. 25, 2013, 4:58 a.m. UTC | #2
On Sun, Nov 24, 2013 at 8:54 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Sat, Nov 23, 2013 at 4:17 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
>> After commit bcdde7e221a8 (sysfs: make __sysfs_remove_dir() recursive)
>> I'm seeing traces analogous to the one below in Thunderbolt testing:
>>
>> WARNING: CPU: 3 PID: 76 at /scratch/rafael/work/linux-pm/fs/sysfs/group.c:214 sysfs_remove_group+0x59/0xe0()
>>  sysfs group ffffffff81c6c500 not found for kobject '0000:08'
>>  Modules linked in: fuse hidp af_packet xt_tcpudp xt_pkttype xt_LOG xt_limit ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT iptable_raw xt_CT iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack rfcomm ip6table_filter bnep ip6_tables x_tables arc4 ath9k mac80211 x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul iTCO_wdt crc32c_intel iTCO_vendor_support ghash_clmulni_intel aesni_intel ablk_helper acer_wmi sparse_keymap ath9k_common ath9k_hw cryptd lrw gf128mul ath3k glue_helper aes_x86_64 btusb microcode ath pcspkr joydev uvcvideo serio_raw videobuf2_core i2c_i801 videodev snd_hda_codec_hdmi cfg80211 videobuf2_vmalloc tg3 videobuf2_memops sg ptp pps_core lpc_ich mfd_core snd_hda_codec_realtek bluetooth hid_logitech_dj rfkill snd_hda_intel snd_hda_codec shpchp battery ac wmi acpi_cpufreq edd snd_usb_audio snd_pcm snd_page_alloc snd_hwdep snd_usbmidi_lib
>> snd_rawmidi snd_seq snd_timer snd_seq_device snd soundcore autofs4 xhci_hcd processor scsi_dh_hp_sw scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_dh
>>  CPU: 3 PID: 76 Comm: kworker/u16:7 Not tainted 3.13.0-rc1+ #76
>>  Hardware name: Acer Aspire S5-391/Venus    , BIOS V1.02 05/29/2012
>>  Workqueue: kacpi_hotplug acpi_hotplug_work_fn
>>   0000000000000009 ffff8801644b9ac8 ffffffff816b23bf 0000000000000007
>>   ffff8801644b9b18 ffff8801644b9b08 ffffffff81046607 ffff88016925b800
>>   0000000000000000 ffffffff81c6c500 ffff88016924f928 ffff88016924f800
>>  Call Trace:
>>   [<ffffffff816b23bf>] dump_stack+0x4e/0x71
>>   [<ffffffff81046607>] warn_slowpath_common+0x87/0xb0
>>   [<ffffffff810466d1>] warn_slowpath_fmt+0x41/0x50
>>   [<ffffffff811e42ef>] ? sysfs_get_dirent_ns+0x6f/0x80
>>   [<ffffffff811e5389>] sysfs_remove_group+0x59/0xe0
>>   [<ffffffff8149f00b>] dpm_sysfs_remove+0x3b/0x50
>>   [<ffffffff81495818>] device_del+0x58/0x1c0
>>   [<ffffffff814959c8>] device_unregister+0x48/0x60
>>   [<ffffffff813254fe>] pci_remove_bus+0x6e/0x80
>>   [<ffffffff81325548>] pci_remove_bus_device+0x38/0x110
>>   [<ffffffff8132555d>] pci_remove_bus_device+0x4d/0x110
>>   [<ffffffff81325639>] pci_stop_and_remove_bus_device+0x19/0x20
>>   [<ffffffff813418d0>] disable_slot+0x20/0xe0
>>   [<ffffffff81341a38>] acpiphp_check_bridge+0xa8/0xd0
>>   [<ffffffff813427ad>] hotplug_event+0x17d/0x220
>>   [<ffffffff81342880>] hotplug_event_work+0x30/0x70
>>   [<ffffffff8136d665>] acpi_hotplug_work_fn+0x18/0x24
>>   [<ffffffff81061331>] process_one_work+0x261/0x450
>>   [<ffffffff81061a7e>] worker_thread+0x21e/0x370
>>   [<ffffffff81061860>] ? rescuer_thread+0x300/0x300
>>   [<ffffffff81068342>] kthread+0xd2/0xe0
>>   [<ffffffff81068270>] ? flush_kthread_worker+0x70/0x70
>>   [<ffffffff816c19bc>] ret_from_fork+0x7c/0xb0
>>   [<ffffffff81068270>] ? flush_kthread_worker+0x70/0x70
>>
>> (Mika Westerberg sees them too in his tests).
>>
>> Some investigation documented in kernel bug #65281 lead me to the
>> conclusion that the source of the problem is the device_del() in
>> pci_stop_dev() as it now causes the sysfs directory of the device
>> to be removed recursively along with all of its subdirectories.
>> That includes the sysfs directory of the device's subordinate
>> bus (dev->subordinate) and its "power" group.
>>
>> Consequently, when pci_remove_bus() is called for dev->subordinate
>> in pci_remove_bus_device(), it calls device_unregister(&bus->dev),
>> but at this point the sysfs directory of bus->dev doesn't exist any
>> more and its "power" group doesn't exist either.  Thus, when
>> dpm_sysfs_remove() called from device_del() tries to remove that
>> group, it triggers the above warning.
>>
>> That indicates a logical mistake in the design of
>> pci_stop_and_remove_bus_device(), which causes bus device objects
>> to be left behind their parents (bridge device objects) and can be
>> fixed by moving the device_del() from pci_stop_dev() into
>> pci_destroy_dev(), so pci_remove_bus() can be called for the
>> device's subordinate bus before the device itself is unregistered
>> from the hierarchy.  Still, the driver, if any, should be detached
>> from the device in pci_stop_dev(), so use device_release_driver()
>> directly from there.
>>
>> References: https://bugzilla.kernel.org/show_bug.cgi?id=65281#c6
>> Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> ---
>>  drivers/pci/remove.c |    4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> Index: linux-pm/drivers/pci/remove.c
>> ===================================================================
>> --- linux-pm.orig/drivers/pci/remove.c
>> +++ linux-pm/drivers/pci/remove.c
>> @@ -24,7 +24,7 @@ static void pci_stop_dev(struct pci_dev
>>         if (dev->is_added) {
>>                 pci_proc_detach_device(dev);
>>                 pci_remove_sysfs_dev_files(dev);
>> -               device_del(&dev->dev);
>> +               device_release_driver(&dev->dev);
>>                 dev->is_added = 0;
>>         }
>>
>> @@ -34,6 +34,8 @@ static void pci_stop_dev(struct pci_dev
>>
>>  static void pci_destroy_dev(struct pci_dev *dev)
>>  {
>> +       device_del(&dev->dev);
>> +
>>         down_write(&pci_bus_sem);
>>         list_del(&dev->bus_list);
>>         up_write(&pci_bus_sem);
>>
>
> Maybe that is not enough. could still need add is_removed ...
>
> Please check
>
> https://lkml.org/lkml/2013/5/13/653
> PATCH 3/7] PCI: Detach driver in pci_stop_device

or you can check if
http://patchwork.ozlabs.org/patch/292622/
  [1/6] PCI: move back pci_proc_attach_devices calling
http://patchwork.ozlabs.org/patch/292623/
  [2/6] PCI: move resources and bus_list releasing to pci_release_dev
http://patchwork.ozlabs.org/patch/292624/
  [3/6] PCI: Destroy pci dev only once

could help with you test setup.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Nov. 25, 2013, 11:22 a.m. UTC | #3
On Sunday, November 24, 2013 08:54:12 PM Yinghai Lu wrote:
> On Sat, Nov 23, 2013 at 4:17 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > After commit bcdde7e221a8 (sysfs: make __sysfs_remove_dir() recursive)
> > I'm seeing traces analogous to the one below in Thunderbolt testing:
> >
> > WARNING: CPU: 3 PID: 76 at /scratch/rafael/work/linux-pm/fs/sysfs/group.c:214 sysfs_remove_group+0x59/0xe0()
> >  sysfs group ffffffff81c6c500 not found for kobject '0000:08'
> >  Modules linked in: fuse hidp af_packet xt_tcpudp xt_pkttype xt_LOG xt_limit ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT iptable_raw xt_CT iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack rfcomm ip6table_filter bnep ip6_tables x_tables arc4 ath9k mac80211 x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul iTCO_wdt crc32c_intel iTCO_vendor_support ghash_clmulni_intel aesni_intel ablk_helper acer_wmi sparse_keymap ath9k_common ath9k_hw cryptd lrw gf128mul ath3k glue_helper aes_x86_64 btusb microcode ath pcspkr joydev uvcvideo serio_raw videobuf2_core i2c_i801 videodev snd_hda_codec_hdmi cfg80211 videobuf2_vmalloc tg3 videobuf2_memops sg ptp pps_core lpc_ich mfd_core snd_hda_codec_realtek bluetooth hid_logitech_dj rfkill snd_hda_intel snd_hda_codec shpchp battery ac wmi acpi_cpufreq edd snd_usb_audio snd_pcm snd_page_alloc snd_hwdep snd_usbmidi_lib
> > snd_rawmidi snd_seq snd_timer snd_seq_device snd soundcore autofs4 xhci_hcd processor scsi_dh_hp_sw scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_dh
> >  CPU: 3 PID: 76 Comm: kworker/u16:7 Not tainted 3.13.0-rc1+ #76
> >  Hardware name: Acer Aspire S5-391/Venus    , BIOS V1.02 05/29/2012
> >  Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> >   0000000000000009 ffff8801644b9ac8 ffffffff816b23bf 0000000000000007
> >   ffff8801644b9b18 ffff8801644b9b08 ffffffff81046607 ffff88016925b800
> >   0000000000000000 ffffffff81c6c500 ffff88016924f928 ffff88016924f800
> >  Call Trace:
> >   [<ffffffff816b23bf>] dump_stack+0x4e/0x71
> >   [<ffffffff81046607>] warn_slowpath_common+0x87/0xb0
> >   [<ffffffff810466d1>] warn_slowpath_fmt+0x41/0x50
> >   [<ffffffff811e42ef>] ? sysfs_get_dirent_ns+0x6f/0x80
> >   [<ffffffff811e5389>] sysfs_remove_group+0x59/0xe0
> >   [<ffffffff8149f00b>] dpm_sysfs_remove+0x3b/0x50
> >   [<ffffffff81495818>] device_del+0x58/0x1c0
> >   [<ffffffff814959c8>] device_unregister+0x48/0x60
> >   [<ffffffff813254fe>] pci_remove_bus+0x6e/0x80
> >   [<ffffffff81325548>] pci_remove_bus_device+0x38/0x110
> >   [<ffffffff8132555d>] pci_remove_bus_device+0x4d/0x110
> >   [<ffffffff81325639>] pci_stop_and_remove_bus_device+0x19/0x20
> >   [<ffffffff813418d0>] disable_slot+0x20/0xe0
> >   [<ffffffff81341a38>] acpiphp_check_bridge+0xa8/0xd0
> >   [<ffffffff813427ad>] hotplug_event+0x17d/0x220
> >   [<ffffffff81342880>] hotplug_event_work+0x30/0x70
> >   [<ffffffff8136d665>] acpi_hotplug_work_fn+0x18/0x24
> >   [<ffffffff81061331>] process_one_work+0x261/0x450
> >   [<ffffffff81061a7e>] worker_thread+0x21e/0x370
> >   [<ffffffff81061860>] ? rescuer_thread+0x300/0x300
> >   [<ffffffff81068342>] kthread+0xd2/0xe0
> >   [<ffffffff81068270>] ? flush_kthread_worker+0x70/0x70
> >   [<ffffffff816c19bc>] ret_from_fork+0x7c/0xb0
> >   [<ffffffff81068270>] ? flush_kthread_worker+0x70/0x70
> >
> > (Mika Westerberg sees them too in his tests).
> >
> > Some investigation documented in kernel bug #65281 lead me to the
> > conclusion that the source of the problem is the device_del() in
> > pci_stop_dev() as it now causes the sysfs directory of the device
> > to be removed recursively along with all of its subdirectories.
> > That includes the sysfs directory of the device's subordinate
> > bus (dev->subordinate) and its "power" group.
> >
> > Consequently, when pci_remove_bus() is called for dev->subordinate
> > in pci_remove_bus_device(), it calls device_unregister(&bus->dev),
> > but at this point the sysfs directory of bus->dev doesn't exist any
> > more and its "power" group doesn't exist either.  Thus, when
> > dpm_sysfs_remove() called from device_del() tries to remove that
> > group, it triggers the above warning.
> >
> > That indicates a logical mistake in the design of
> > pci_stop_and_remove_bus_device(), which causes bus device objects
> > to be left behind their parents (bridge device objects) and can be
> > fixed by moving the device_del() from pci_stop_dev() into
> > pci_destroy_dev(), so pci_remove_bus() can be called for the
> > device's subordinate bus before the device itself is unregistered
> > from the hierarchy.  Still, the driver, if any, should be detached
> > from the device in pci_stop_dev(), so use device_release_driver()
> > directly from there.
> >
> > References: https://bugzilla.kernel.org/show_bug.cgi?id=65281#c6
> > Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> >  drivers/pci/remove.c |    4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > Index: linux-pm/drivers/pci/remove.c
> > ===================================================================
> > --- linux-pm.orig/drivers/pci/remove.c
> > +++ linux-pm/drivers/pci/remove.c
> > @@ -24,7 +24,7 @@ static void pci_stop_dev(struct pci_dev
> >         if (dev->is_added) {
> >                 pci_proc_detach_device(dev);
> >                 pci_remove_sysfs_dev_files(dev);
> > -               device_del(&dev->dev);
> > +               device_release_driver(&dev->dev);
> >                 dev->is_added = 0;
> >         }
> >
> > @@ -34,6 +34,8 @@ static void pci_stop_dev(struct pci_dev
> >
> >  static void pci_destroy_dev(struct pci_dev *dev)
> >  {
> > +       device_del(&dev->dev);
> > +
> >         down_write(&pci_bus_sem);
> >         list_del(&dev->bus_list);
> >         up_write(&pci_bus_sem);
> >
> 
> Maybe that is not enough. could still need add is_removed ...

Well, is_removed is only used by pci_destroy_dev() in your patch, right?

That means its only role is to protect the device from being destroyed
twice (or more times) in a row, but that surely would be a bug?  I don't
see how that can legitimately happen at least, so what exactly is the
scenario?

> Please check
> 
> https://lkml.org/lkml/2013/5/13/653
> PATCH 3/7] PCI: Detach driver in pci_stop_device

It looks like there are more patches needed to apply this one.

Thanks!
Rafael J. Wysocki Nov. 25, 2013, 11:23 a.m. UTC | #4
On Sunday, November 24, 2013 08:58:12 PM Yinghai Lu wrote:
> On Sun, Nov 24, 2013 at 8:54 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> > On Sat, Nov 23, 2013 at 4:17 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >>
> >> After commit bcdde7e221a8 (sysfs: make __sysfs_remove_dir() recursive)
> >> I'm seeing traces analogous to the one below in Thunderbolt testing:
> >>
> >> WARNING: CPU: 3 PID: 76 at /scratch/rafael/work/linux-pm/fs/sysfs/group.c:214 sysfs_remove_group+0x59/0xe0()
> >>  sysfs group ffffffff81c6c500 not found for kobject '0000:08'
> >>  Modules linked in: fuse hidp af_packet xt_tcpudp xt_pkttype xt_LOG xt_limit ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT iptable_raw xt_CT iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack rfcomm ip6table_filter bnep ip6_tables x_tables arc4 ath9k mac80211 x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul iTCO_wdt crc32c_intel iTCO_vendor_support ghash_clmulni_intel aesni_intel ablk_helper acer_wmi sparse_keymap ath9k_common ath9k_hw cryptd lrw gf128mul ath3k glue_helper aes_x86_64 btusb microcode ath pcspkr joydev uvcvideo serio_raw videobuf2_core i2c_i801 videodev snd_hda_codec_hdmi cfg80211 videobuf2_vmalloc tg3 videobuf2_memops sg ptp pps_core lpc_ich mfd_core snd_hda_codec_realtek bluetooth hid_logitech_dj rfkill snd_hda_intel snd_hda_codec shpchp battery ac wmi acpi_cpufreq edd snd_usb_audio snd_pcm snd_page_alloc snd_hwdep snd_usbmidi_lib
> >> snd_rawmidi snd_seq snd_timer snd_seq_device snd soundcore autofs4 xhci_hcd processor scsi_dh_hp_sw scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_dh
> >>  CPU: 3 PID: 76 Comm: kworker/u16:7 Not tainted 3.13.0-rc1+ #76
> >>  Hardware name: Acer Aspire S5-391/Venus    , BIOS V1.02 05/29/2012
> >>  Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> >>   0000000000000009 ffff8801644b9ac8 ffffffff816b23bf 0000000000000007
> >>   ffff8801644b9b18 ffff8801644b9b08 ffffffff81046607 ffff88016925b800
> >>   0000000000000000 ffffffff81c6c500 ffff88016924f928 ffff88016924f800
> >>  Call Trace:
> >>   [<ffffffff816b23bf>] dump_stack+0x4e/0x71
> >>   [<ffffffff81046607>] warn_slowpath_common+0x87/0xb0
> >>   [<ffffffff810466d1>] warn_slowpath_fmt+0x41/0x50
> >>   [<ffffffff811e42ef>] ? sysfs_get_dirent_ns+0x6f/0x80
> >>   [<ffffffff811e5389>] sysfs_remove_group+0x59/0xe0
> >>   [<ffffffff8149f00b>] dpm_sysfs_remove+0x3b/0x50
> >>   [<ffffffff81495818>] device_del+0x58/0x1c0
> >>   [<ffffffff814959c8>] device_unregister+0x48/0x60
> >>   [<ffffffff813254fe>] pci_remove_bus+0x6e/0x80
> >>   [<ffffffff81325548>] pci_remove_bus_device+0x38/0x110
> >>   [<ffffffff8132555d>] pci_remove_bus_device+0x4d/0x110
> >>   [<ffffffff81325639>] pci_stop_and_remove_bus_device+0x19/0x20
> >>   [<ffffffff813418d0>] disable_slot+0x20/0xe0
> >>   [<ffffffff81341a38>] acpiphp_check_bridge+0xa8/0xd0
> >>   [<ffffffff813427ad>] hotplug_event+0x17d/0x220
> >>   [<ffffffff81342880>] hotplug_event_work+0x30/0x70
> >>   [<ffffffff8136d665>] acpi_hotplug_work_fn+0x18/0x24
> >>   [<ffffffff81061331>] process_one_work+0x261/0x450
> >>   [<ffffffff81061a7e>] worker_thread+0x21e/0x370
> >>   [<ffffffff81061860>] ? rescuer_thread+0x300/0x300
> >>   [<ffffffff81068342>] kthread+0xd2/0xe0
> >>   [<ffffffff81068270>] ? flush_kthread_worker+0x70/0x70
> >>   [<ffffffff816c19bc>] ret_from_fork+0x7c/0xb0
> >>   [<ffffffff81068270>] ? flush_kthread_worker+0x70/0x70
> >>
> >> (Mika Westerberg sees them too in his tests).
> >>
> >> Some investigation documented in kernel bug #65281 lead me to the
> >> conclusion that the source of the problem is the device_del() in
> >> pci_stop_dev() as it now causes the sysfs directory of the device
> >> to be removed recursively along with all of its subdirectories.
> >> That includes the sysfs directory of the device's subordinate
> >> bus (dev->subordinate) and its "power" group.
> >>
> >> Consequently, when pci_remove_bus() is called for dev->subordinate
> >> in pci_remove_bus_device(), it calls device_unregister(&bus->dev),
> >> but at this point the sysfs directory of bus->dev doesn't exist any
> >> more and its "power" group doesn't exist either.  Thus, when
> >> dpm_sysfs_remove() called from device_del() tries to remove that
> >> group, it triggers the above warning.
> >>
> >> That indicates a logical mistake in the design of
> >> pci_stop_and_remove_bus_device(), which causes bus device objects
> >> to be left behind their parents (bridge device objects) and can be
> >> fixed by moving the device_del() from pci_stop_dev() into
> >> pci_destroy_dev(), so pci_remove_bus() can be called for the
> >> device's subordinate bus before the device itself is unregistered
> >> from the hierarchy.  Still, the driver, if any, should be detached
> >> from the device in pci_stop_dev(), so use device_release_driver()
> >> directly from there.
> >>
> >> References: https://bugzilla.kernel.org/show_bug.cgi?id=65281#c6
> >> Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> >> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >> ---
> >>  drivers/pci/remove.c |    4 +++-
> >>  1 file changed, 3 insertions(+), 1 deletion(-)
> >>
> >> Index: linux-pm/drivers/pci/remove.c
> >> ===================================================================
> >> --- linux-pm.orig/drivers/pci/remove.c
> >> +++ linux-pm/drivers/pci/remove.c
> >> @@ -24,7 +24,7 @@ static void pci_stop_dev(struct pci_dev
> >>         if (dev->is_added) {
> >>                 pci_proc_detach_device(dev);
> >>                 pci_remove_sysfs_dev_files(dev);
> >> -               device_del(&dev->dev);
> >> +               device_release_driver(&dev->dev);
> >>                 dev->is_added = 0;
> >>         }
> >>
> >> @@ -34,6 +34,8 @@ static void pci_stop_dev(struct pci_dev
> >>
> >>  static void pci_destroy_dev(struct pci_dev *dev)
> >>  {
> >> +       device_del(&dev->dev);
> >> +
> >>         down_write(&pci_bus_sem);
> >>         list_del(&dev->bus_list);
> >>         up_write(&pci_bus_sem);
> >>
> >
> > Maybe that is not enough. could still need add is_removed ...
> >
> > Please check
> >
> > https://lkml.org/lkml/2013/5/13/653
> > PATCH 3/7] PCI: Detach driver in pci_stop_device
> 
> or you can check if
> http://patchwork.ozlabs.org/patch/292622/
>   [1/6] PCI: move back pci_proc_attach_devices calling
> http://patchwork.ozlabs.org/patch/292623/
>   [2/6] PCI: move resources and bus_list releasing to pci_release_dev
> http://patchwork.ozlabs.org/patch/292624/
>   [3/6] PCI: Destroy pci dev only once
> 
> could help with you test setup.

I honestly don't think that all of the additional changes are needed to fix
this particular problem.

Thanks!
Rafael J. Wysocki Nov. 25, 2013, 11:24 a.m. UTC | #5
On Monday, November 25, 2013 11:47:07 AM Mika Westerberg wrote:
> On Sun, Nov 24, 2013 at 01:17:52AM +0100, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > 
> > After commit bcdde7e221a8 (sysfs: make __sysfs_remove_dir() recursive)
> > I'm seeing traces analogous to the one below in Thunderbolt testing:
> > 
> > WARNING: CPU: 3 PID: 76 at /scratch/rafael/work/linux-pm/fs/sysfs/group.c:214 sysfs_remove_group+0x59/0xe0()
> >  sysfs group ffffffff81c6c500 not found for kobject '0000:08'
> >  Modules linked in: fuse hidp af_packet xt_tcpudp xt_pkttype xt_LOG xt_limit ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT iptable_raw xt_CT iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack rfcomm ip6table_filter bnep ip6_tables x_tables arc4 ath9k mac80211 x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul iTCO_wdt crc32c_intel iTCO_vendor_support ghash_clmulni_intel aesni_intel ablk_helper acer_wmi sparse_keymap ath9k_common ath9k_hw cryptd lrw gf128mul ath3k glue_helper aes_x86_64 btusb microcode ath pcspkr joydev uvcvideo serio_raw videobuf2_core i2c_i801 videodev snd_hda_codec_hdmi cfg80211 videobuf2_vmalloc tg3 videobuf2_memops sg ptp pps_core lpc_ich mfd_core snd_hda_codec_realtek bluetooth hid_logitech_dj rfkill snd_hda_intel snd_hda_codec shpchp battery ac wmi acpi_cpufreq edd snd_usb_audio snd_pcm snd_page_alloc snd_hwdep snd_usbmidi_lib 
> > snd_rawmidi snd_seq snd_timer snd_seq_device snd soundcore autofs4 xhci_hcd processor scsi_dh_hp_sw scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_dh
> >  CPU: 3 PID: 76 Comm: kworker/u16:7 Not tainted 3.13.0-rc1+ #76
> >  Hardware name: Acer Aspire S5-391/Venus    , BIOS V1.02 05/29/2012
> >  Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> >   0000000000000009 ffff8801644b9ac8 ffffffff816b23bf 0000000000000007
> >   ffff8801644b9b18 ffff8801644b9b08 ffffffff81046607 ffff88016925b800
> >   0000000000000000 ffffffff81c6c500 ffff88016924f928 ffff88016924f800
> >  Call Trace:
> >   [<ffffffff816b23bf>] dump_stack+0x4e/0x71
> >   [<ffffffff81046607>] warn_slowpath_common+0x87/0xb0
> >   [<ffffffff810466d1>] warn_slowpath_fmt+0x41/0x50
> >   [<ffffffff811e42ef>] ? sysfs_get_dirent_ns+0x6f/0x80
> >   [<ffffffff811e5389>] sysfs_remove_group+0x59/0xe0
> >   [<ffffffff8149f00b>] dpm_sysfs_remove+0x3b/0x50
> >   [<ffffffff81495818>] device_del+0x58/0x1c0
> >   [<ffffffff814959c8>] device_unregister+0x48/0x60
> >   [<ffffffff813254fe>] pci_remove_bus+0x6e/0x80
> >   [<ffffffff81325548>] pci_remove_bus_device+0x38/0x110
> >   [<ffffffff8132555d>] pci_remove_bus_device+0x4d/0x110
> >   [<ffffffff81325639>] pci_stop_and_remove_bus_device+0x19/0x20
> >   [<ffffffff813418d0>] disable_slot+0x20/0xe0
> >   [<ffffffff81341a38>] acpiphp_check_bridge+0xa8/0xd0
> >   [<ffffffff813427ad>] hotplug_event+0x17d/0x220
> >   [<ffffffff81342880>] hotplug_event_work+0x30/0x70
> >   [<ffffffff8136d665>] acpi_hotplug_work_fn+0x18/0x24
> >   [<ffffffff81061331>] process_one_work+0x261/0x450
> >   [<ffffffff81061a7e>] worker_thread+0x21e/0x370
> >   [<ffffffff81061860>] ? rescuer_thread+0x300/0x300
> >   [<ffffffff81068342>] kthread+0xd2/0xe0
> >   [<ffffffff81068270>] ? flush_kthread_worker+0x70/0x70
> >   [<ffffffff816c19bc>] ret_from_fork+0x7c/0xb0
> >   [<ffffffff81068270>] ? flush_kthread_worker+0x70/0x70
> > 
> > (Mika Westerberg sees them too in his tests).
> > 
> > Some investigation documented in kernel bug #65281 lead me to the
> > conclusion that the source of the problem is the device_del() in
> > pci_stop_dev() as it now causes the sysfs directory of the device
> > to be removed recursively along with all of its subdirectories.
> > That includes the sysfs directory of the device's subordinate
> > bus (dev->subordinate) and its "power" group.
> > 
> > Consequently, when pci_remove_bus() is called for dev->subordinate
> > in pci_remove_bus_device(), it calls device_unregister(&bus->dev),
> > but at this point the sysfs directory of bus->dev doesn't exist any
> > more and its "power" group doesn't exist either.  Thus, when
> > dpm_sysfs_remove() called from device_del() tries to remove that
> > group, it triggers the above warning.
> > 
> > That indicates a logical mistake in the design of
> > pci_stop_and_remove_bus_device(), which causes bus device objects
> > to be left behind their parents (bridge device objects) and can be
> > fixed by moving the device_del() from pci_stop_dev() into
> > pci_destroy_dev(), so pci_remove_bus() can be called for the
> > device's subordinate bus before the device itself is unregistered
> > from the hierarchy.  Still, the driver, if any, should be detached
> > from the device in pci_stop_dev(), so use device_release_driver()
> > directly from there.
> 
> With only this patch applied, I can still see the warnings :-/ I'm going to
> test the ata fix you proposed soon.

Yes, you need both this one and the ATA fix.

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu Nov. 25, 2013, 7:45 p.m. UTC | #6
On Mon, Nov 25, 2013 at 3:22 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
> Well, is_removed is only used by pci_destroy_dev() in your patch, right?
>
> That means its only role is to protect the device from being destroyed
> twice (or more times) in a row, but that surely would be a bug?  I don't
> see how that can legitimately happen at least, so what exactly is the
> scenario?

The thread:
https://patchwork.kernel.org/patch/3119001/
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu Nov. 25, 2013, 7:48 p.m. UTC | #7
On Mon, Nov 25, 2013 at 3:23 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>>
>> or you can check if
>> http://patchwork.ozlabs.org/patch/292622/
>>   [1/6] PCI: move back pci_proc_attach_devices calling
>> http://patchwork.ozlabs.org/patch/292623/
>>   [2/6] PCI: move resources and bus_list releasing to pci_release_dev
>> http://patchwork.ozlabs.org/patch/292624/
>>   [3/6] PCI: Destroy pci dev only once
>>
>> could help with you test setup.
>
> I honestly don't think that all of the additional changes are needed to fix
> this particular problem.

I thought
[3/6] PCI: Destroy pci dev only once
should be enough, but it depend change on other two.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Nov. 25, 2013, 8:57 p.m. UTC | #8
On Monday, November 25, 2013 11:45:50 AM Yinghai Lu wrote:
> On Mon, Nov 25, 2013 at 3:22 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >
> > Well, is_removed is only used by pci_destroy_dev() in your patch, right?
> >
> > That means its only role is to protect the device from being destroyed
> > twice (or more times) in a row, but that surely would be a bug?  I don't
> > see how that can legitimately happen at least, so what exactly is the
> > scenario?
> 
> The thread:
> https://patchwork.kernel.org/patch/3119001/

Thanks for the pointer.

Well, so we have a bug in there and it is a *race* so adding a device flag
is not going to really help.  Besides, if the put_device() really frees the
struct pci_dev, accessing the flag itself would be a use-after-free,
wouldn't it?

What seems to be necessary is a lock preventing the
/sys/bus/pci/devices/.../remove interface from being used on multiple devices
in parallel.  Of course, it also has to protect against removals from hotplug
events racing with removals from /sys/bus/pci/devices/.../remove.

In any case, it is beyond the scope of the $subject patch, though.

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Nov. 25, 2013, 9:59 p.m. UTC | #9
On Sat, Nov 23, 2013 at 5:17 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> After commit bcdde7e221a8 (sysfs: make __sysfs_remove_dir() recursive)
> I'm seeing traces analogous to the one below in Thunderbolt testing:
>
> WARNING: CPU: 3 PID: 76 at /scratch/rafael/work/linux-pm/fs/sysfs/group.c:214 sysfs_remove_group+0x59/0xe0()
>  sysfs group ffffffff81c6c500 not found for kobject '0000:08'
>  Modules linked in: fuse hidp af_packet xt_tcpudp xt_pkttype xt_LOG xt_limit ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT iptable_raw xt_CT iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack rfcomm ip6table_filter bnep ip6_tables x_tables arc4 ath9k mac80211 x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul iTCO_wdt crc32c_intel iTCO_vendor_support ghash_clmulni_intel aesni_intel ablk_helper acer_wmi sparse_keymap ath9k_common ath9k_hw cryptd lrw gf128mul ath3k glue_helper aes_x86_64 btusb microcode ath pcspkr joydev uvcvideo serio_raw videobuf2_core i2c_i801 videodev snd_hda_codec_hdmi cfg80211 videobuf2_vmalloc tg3 videobuf2_memops sg ptp pps_core lpc_ich mfd_core snd_hda_codec_realtek bluetooth hid_logitech_dj rfkill snd_hda_intel snd_hda_codec shpchp battery ac wmi acpi_cpufreq edd snd_usb_audio snd_pcm snd_page_alloc snd_hwdep snd_usbmidi_lib
> snd_rawmidi snd_seq snd_timer snd_seq_device snd soundcore autofs4 xhci_hcd processor scsi_dh_hp_sw scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_dh
>  CPU: 3 PID: 76 Comm: kworker/u16:7 Not tainted 3.13.0-rc1+ #76
>  Hardware name: Acer Aspire S5-391/Venus    , BIOS V1.02 05/29/2012
>  Workqueue: kacpi_hotplug acpi_hotplug_work_fn
>   0000000000000009 ffff8801644b9ac8 ffffffff816b23bf 0000000000000007
>   ffff8801644b9b18 ffff8801644b9b08 ffffffff81046607 ffff88016925b800
>   0000000000000000 ffffffff81c6c500 ffff88016924f928 ffff88016924f800
>  Call Trace:
>   [<ffffffff816b23bf>] dump_stack+0x4e/0x71
>   [<ffffffff81046607>] warn_slowpath_common+0x87/0xb0
>   [<ffffffff810466d1>] warn_slowpath_fmt+0x41/0x50
>   [<ffffffff811e42ef>] ? sysfs_get_dirent_ns+0x6f/0x80
>   [<ffffffff811e5389>] sysfs_remove_group+0x59/0xe0
>   [<ffffffff8149f00b>] dpm_sysfs_remove+0x3b/0x50
>   [<ffffffff81495818>] device_del+0x58/0x1c0
>   [<ffffffff814959c8>] device_unregister+0x48/0x60
>   [<ffffffff813254fe>] pci_remove_bus+0x6e/0x80
>   [<ffffffff81325548>] pci_remove_bus_device+0x38/0x110
>   [<ffffffff8132555d>] pci_remove_bus_device+0x4d/0x110
>   [<ffffffff81325639>] pci_stop_and_remove_bus_device+0x19/0x20
>   [<ffffffff813418d0>] disable_slot+0x20/0xe0
>   [<ffffffff81341a38>] acpiphp_check_bridge+0xa8/0xd0
>   [<ffffffff813427ad>] hotplug_event+0x17d/0x220
>   [<ffffffff81342880>] hotplug_event_work+0x30/0x70
>   [<ffffffff8136d665>] acpi_hotplug_work_fn+0x18/0x24
>   [<ffffffff81061331>] process_one_work+0x261/0x450
>   [<ffffffff81061a7e>] worker_thread+0x21e/0x370
>   [<ffffffff81061860>] ? rescuer_thread+0x300/0x300
>   [<ffffffff81068342>] kthread+0xd2/0xe0
>   [<ffffffff81068270>] ? flush_kthread_worker+0x70/0x70
>   [<ffffffff816c19bc>] ret_from_fork+0x7c/0xb0
>   [<ffffffff81068270>] ? flush_kthread_worker+0x70/0x70
>
> (Mika Westerberg sees them too in his tests).
>
> Some investigation documented in kernel bug #65281 lead me to the
> conclusion that the source of the problem is the device_del() in
> pci_stop_dev() as it now causes the sysfs directory of the device
> to be removed recursively along with all of its subdirectories.
> That includes the sysfs directory of the device's subordinate
> bus (dev->subordinate) and its "power" group.
>
> Consequently, when pci_remove_bus() is called for dev->subordinate
> in pci_remove_bus_device(), it calls device_unregister(&bus->dev),
> but at this point the sysfs directory of bus->dev doesn't exist any
> more and its "power" group doesn't exist either.  Thus, when
> dpm_sysfs_remove() called from device_del() tries to remove that
> group, it triggers the above warning.
>
> That indicates a logical mistake in the design of
> pci_stop_and_remove_bus_device(), which causes bus device objects
> to be left behind their parents (bridge device objects) and can be
> fixed by moving the device_del() from pci_stop_dev() into
> pci_destroy_dev(), so pci_remove_bus() can be called for the
> device's subordinate bus before the device itself is unregistered
> from the hierarchy.  Still, the driver, if any, should be detached
> from the device in pci_stop_dev(), so use device_release_driver()
> directly from there.
>
> References: https://bugzilla.kernel.org/show_bug.cgi?id=65281#c6
> Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Applied to for-linus for v3.13.  Thanks a lot for all your work on this issue!

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Nov. 25, 2013, 10:19 p.m. UTC | #10
On Monday, November 25, 2013 02:59:44 PM Bjorn Helgaas wrote:
> On Sat, Nov 23, 2013 at 5:17 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > After commit bcdde7e221a8 (sysfs: make __sysfs_remove_dir() recursive)
> > I'm seeing traces analogous to the one below in Thunderbolt testing:

[...]

> > (Mika Westerberg sees them too in his tests).
> >
> > Some investigation documented in kernel bug #65281 lead me to the
> > conclusion that the source of the problem is the device_del() in
> > pci_stop_dev() as it now causes the sysfs directory of the device
> > to be removed recursively along with all of its subdirectories.
> > That includes the sysfs directory of the device's subordinate
> > bus (dev->subordinate) and its "power" group.
> >
> > Consequently, when pci_remove_bus() is called for dev->subordinate
> > in pci_remove_bus_device(), it calls device_unregister(&bus->dev),
> > but at this point the sysfs directory of bus->dev doesn't exist any
> > more and its "power" group doesn't exist either.  Thus, when
> > dpm_sysfs_remove() called from device_del() tries to remove that
> > group, it triggers the above warning.
> >
> > That indicates a logical mistake in the design of
> > pci_stop_and_remove_bus_device(), which causes bus device objects
> > to be left behind their parents (bridge device objects) and can be
> > fixed by moving the device_del() from pci_stop_dev() into
> > pci_destroy_dev(), so pci_remove_bus() can be called for the
> > device's subordinate bus before the device itself is unregistered
> > from the hierarchy.  Still, the driver, if any, should be detached
> > from the device in pci_stop_dev(), so use device_release_driver()
> > directly from there.
> >
> > References: https://bugzilla.kernel.org/show_bug.cgi?id=65281#c6
> > Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Applied to for-linus for v3.13.  Thanks a lot for all your work on this issue!

Thanks, and no problem! :-)

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jingoo Han Nov. 28, 2013, 12:41 a.m. UTC | #11
On Tuesday, November 26, 2013 7:00 AM, Bjorn Helgaas wrote:
> On Sat, Nov 23, 2013 at 5:17 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > After commit bcdde7e221a8 (sysfs: make __sysfs_remove_dir() recursive)
> > I'm seeing traces analogous to the one below in Thunderbolt testing:
> >
> > WARNING: CPU: 3 PID: 76 at /scratch/rafael/work/linux-pm/fs/sysfs/group.c:214
> sysfs_remove_group+0x59/0xe0()
> >  sysfs group ffffffff81c6c500 not found for kobject '0000:08'
> >  Modules linked in: fuse hidp af_packet xt_tcpudp xt_pkttype xt_LOG xt_limit ip6t_REJECT
> nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT iptable_raw xt_CT iptable_filter
> ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4
> ip_tables xt_conntrack nf_conntrack rfcomm ip6table_filter bnep ip6_tables x_tables arc4 ath9k
> mac80211 x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul iTCO_wdt
> crc32c_intel iTCO_vendor_support ghash_clmulni_intel aesni_intel ablk_helper acer_wmi sparse_keymap
> ath9k_common ath9k_hw cryptd lrw gf128mul ath3k glue_helper aes_x86_64 btusb microcode ath pcspkr
> joydev uvcvideo serio_raw videobuf2_core i2c_i801 videodev snd_hda_codec_hdmi cfg80211
> videobuf2_vmalloc tg3 videobuf2_memops sg ptp pps_core lpc_ich mfd_core snd_hda_codec_realtek
> bluetooth hid_logitech_dj rfkill snd_hda_intel snd_hda_codec shpchp battery ac wmi acpi_cpufreq edd
> snd_usb_audio snd_pcm snd_page_alloc snd_hwdep
>  snd_usbmidi_lib
> > snd_rawmidi snd_seq snd_timer snd_seq_device snd soundcore autofs4 xhci_hcd processor scsi_dh_hp_sw
> scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_dh
> >  CPU: 3 PID: 76 Comm: kworker/u16:7 Not tainted 3.13.0-rc1+ #76
> >  Hardware name: Acer Aspire S5-391/Venus    , BIOS V1.02 05/29/2012
> >  Workqueue: kacpi_hotplug acpi_hotplug_work_fn
> >   0000000000000009 ffff8801644b9ac8 ffffffff816b23bf 0000000000000007
> >   ffff8801644b9b18 ffff8801644b9b08 ffffffff81046607 ffff88016925b800
> >   0000000000000000 ffffffff81c6c500 ffff88016924f928 ffff88016924f800
> >  Call Trace:
> >   [<ffffffff816b23bf>] dump_stack+0x4e/0x71
> >   [<ffffffff81046607>] warn_slowpath_common+0x87/0xb0
> >   [<ffffffff810466d1>] warn_slowpath_fmt+0x41/0x50
> >   [<ffffffff811e42ef>] ? sysfs_get_dirent_ns+0x6f/0x80
> >   [<ffffffff811e5389>] sysfs_remove_group+0x59/0xe0
> >   [<ffffffff8149f00b>] dpm_sysfs_remove+0x3b/0x50
> >   [<ffffffff81495818>] device_del+0x58/0x1c0
> >   [<ffffffff814959c8>] device_unregister+0x48/0x60
> >   [<ffffffff813254fe>] pci_remove_bus+0x6e/0x80
> >   [<ffffffff81325548>] pci_remove_bus_device+0x38/0x110
> >   [<ffffffff8132555d>] pci_remove_bus_device+0x4d/0x110
> >   [<ffffffff81325639>] pci_stop_and_remove_bus_device+0x19/0x20
> >   [<ffffffff813418d0>] disable_slot+0x20/0xe0
> >   [<ffffffff81341a38>] acpiphp_check_bridge+0xa8/0xd0
> >   [<ffffffff813427ad>] hotplug_event+0x17d/0x220
> >   [<ffffffff81342880>] hotplug_event_work+0x30/0x70
> >   [<ffffffff8136d665>] acpi_hotplug_work_fn+0x18/0x24
> >   [<ffffffff81061331>] process_one_work+0x261/0x450
> >   [<ffffffff81061a7e>] worker_thread+0x21e/0x370
> >   [<ffffffff81061860>] ? rescuer_thread+0x300/0x300
> >   [<ffffffff81068342>] kthread+0xd2/0xe0
> >   [<ffffffff81068270>] ? flush_kthread_worker+0x70/0x70
> >   [<ffffffff816c19bc>] ret_from_fork+0x7c/0xb0
> >   [<ffffffff81068270>] ? flush_kthread_worker+0x70/0x70
> >
> > (Mika Westerberg sees them too in his tests).
> >
> > Some investigation documented in kernel bug #65281 lead me to the
> > conclusion that the source of the problem is the device_del() in
> > pci_stop_dev() as it now causes the sysfs directory of the device
> > to be removed recursively along with all of its subdirectories.
> > That includes the sysfs directory of the device's subordinate
> > bus (dev->subordinate) and its "power" group.
> >
> > Consequently, when pci_remove_bus() is called for dev->subordinate
> > in pci_remove_bus_device(), it calls device_unregister(&bus->dev),
> > but at this point the sysfs directory of bus->dev doesn't exist any
> > more and its "power" group doesn't exist either.  Thus, when
> > dpm_sysfs_remove() called from device_del() tries to remove that
> > group, it triggers the above warning.
> >
> > That indicates a logical mistake in the design of
> > pci_stop_and_remove_bus_device(), which causes bus device objects
> > to be left behind their parents (bridge device objects) and can be
> > fixed by moving the device_del() from pci_stop_dev() into
> > pci_destroy_dev(), so pci_remove_bus() can be called for the
> > device's subordinate bus before the device itself is unregistered
> > from the hierarchy.  Still, the driver, if any, should be detached
> > from the device in pci_stop_dev(), so use device_release_driver()
> > directly from there.
> >
> > References: https://bugzilla.kernel.org/show_bug.cgi?id=65281#c6
> > Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Applied to for-linus for v3.13.  Thanks a lot for all your work on this issue!

Hi Rafael J. Wysocki,

A week ago, I found this warning when removing PCI devices.
However, I was not able to fix this warning.

Today, I tested your patch on Samsung Exynos platform
with PCI-LAN card. I checked that this warning is resolved.
I really appreciate you patch! :-)
Thank you.

Best regards,
Jingoo Han

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Index: linux-pm/drivers/pci/remove.c
===================================================================
--- linux-pm.orig/drivers/pci/remove.c
+++ linux-pm/drivers/pci/remove.c
@@ -24,7 +24,7 @@  static void pci_stop_dev(struct pci_dev
 	if (dev->is_added) {
 		pci_proc_detach_device(dev);
 		pci_remove_sysfs_dev_files(dev);
-		device_del(&dev->dev);
+		device_release_driver(&dev->dev);
 		dev->is_added = 0;
 	}
 
@@ -34,6 +34,8 @@  static void pci_stop_dev(struct pci_dev
 
 static void pci_destroy_dev(struct pci_dev *dev)
 {
+	device_del(&dev->dev);
+
 	down_write(&pci_bus_sem);
 	list_del(&dev->bus_list);
 	up_write(&pci_bus_sem);