diff mbox

[3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

Message ID 20131119013343.GA17294@google.com
State Not Applicable
Headers show

Commit Message

Bjorn Helgaas Nov. 19, 2013, 1:33 a.m. UTC
On Fri, Nov 15, 2013 at 01:52:35PM +0200, Mika Westerberg wrote:
> On Thu, Oct 24, 2013 at 09:33:50PM -0600, Bjorn Helgaas wrote:
> > On Wed, Oct 23, 2013 at 11:53 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> > > On Tue, Oct 22, 2013 at 8:32 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> > >> On Thu, Oct 17, 2013 at 7:59 AM, Andreas Noever <andreas.noever@gmail.com> wrote:
> > >>> On Wed, Oct 16, 2013 at 10:21 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> > >>>> On Tue, Oct 15, 2013 at 03:44:52AM +0100, Matthew Garrett wrote:
> > >>>>> On Mon, Oct 14, 2013 at 05:50:38PM -0600, Bjorn Helgaas wrote:
> > >>>>> > On Mon, Oct 14, 2013 at 4:47 PM, Andreas Noever <andreas.noever@gmail.com> wrote:
> > >>>>> > > When I unplug the Thunderbolt ethernet adapter on my MacBookPro Linux
> > >>>>> > > crashes a few seconds later. Using
> > >>>>> > > echo 1 > /sys/bus/pci/devices/0000:08:00.0/remove
> > >>>>> > > to remove a bridge two levels above the device triggers the fault immediately:
> > 
> > >>>> We save a pci_dev pointer in the pci_pme_list, which of course has a
> > >>>> longer lifetime than the pci_dev itself, but we don't acquire a reference
> > >>>> on it, so I suspect the pci_dev got released before we got around to
> > >>>> doing the pci_pme_list_scan().
> > >>>>
> > >>>> Andreas, can you try the patch below?  It's against v3.12-rc2, but it
> > >>>> should apply to v3.11, too.
> > >>>
> > >>> I have tested your patch against 3.11 where it solves the problem. Thanks!
> > >>>
> > >>> Unfortunately I could not reproduce the problem in 3.12-rc5. I only
> > >>> get the following warning (and no crash):
> > >>>
> > >>> tg3 0000:0a:00.0: PME# disabled
> > >>> pcieport 0000:09:00.0: PME# disabled
> > >>> pciehp 0000:09:00.0:pcie24: unloading service driver pciehp
> > >>> pci_bus 0000:0a: dev 00, dec refcount to 0
> > >>> pci_bus 0000:0a: dev 00, released physical slot 9
> > >>> ------------[ cut here ]------------
> > >>> WARNING: CPU: 0 PID: 122 at drivers/pci/pci.c:1430
> > >>> pci_disable_device+0x84/0x90()
> > >>> Device pcieport
> > >>> disabling already-disabled device
> > >>> ...
> > 
> > >>> Bisection points to 928bea964827d7824b548c1f8e06eccbbc4d0d7d .
> > >>
> > >> This is "PCI: Delay enabling bridges until they're needed" by Yinghai.
> > >
> > > that double disabling should be addressed by:
> > >
> > > https://lkml.org/lkml/2013/4/25/608
> > >
> > > [PATCH] PCI: Remove duplicate pci_disable_device for pcie port
> > 
> > I'll look at that patch again.  I had some questions about it the
> > first time, but perhaps it makes more sense after 928bea9648 has been
> > applied.
> 
> Bjorn,
> 
> Are there any plans to apply the above patch?
> 
> I'm seeing that warning on all my TBT test machines:
> 
> [  122.914180] pcieport 0000:06:05.0: PME# disabled
> [  122.915386] ------------[ cut here ]------------
> [  122.916513] WARNING: CPU: 0 PID: 1060 at drivers/pci/pci.c:1430 pci_disable_device+0x7c/0x90()
> [  122.917589] Device pcieport
> [  122.917589] disabling already-disabled device

I fixed the changelog (the extra disable was actually added by d899871936,
not by dc5351784e) and put the patch below in my for-linus branch.  I'll
ask Linus to pull it later this week.

Sorry for the delay, and thanks for the reminder.

Bjorn


PCI: Remove duplicate pci_disable_device() from pcie_portdrv_remove()

From: Yinghai Lu <yinghai@kernel.org>

The pcie_portdrv .probe() method calls pci_enable_device() once, in
pcie_port_device_register(), but the .remove() method calls
pci_disable_device() twice, in pcie_port_device_remove() and in
pcie_portdrv_remove().

That causes a "disabling already-disabled device" warning when removing a
PCIe port device.  This happens all the time when removing Thunderbolt
devices, but is also easy to reproduce with, e.g.,
"echo 0000:00:1c.3 > /sys/bus/pci/drivers/pcieport/unbind"

This patch removes the disable from pcie_portdrv_remove().

[bhelgaas: changelog, tag for stable]
Reported-by: David Bulkow <David.Bulkow@stratus.com>
Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org	# v2.6.32+
---
 drivers/pci/pcie/portdrv_pci.c |    1 -
 1 file changed, 1 deletion(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Yijing Wang Nov. 19, 2013, 1:54 a.m. UTC | #1
> The pcie_portdrv .probe() method calls pci_enable_device() once, in
> pcie_port_device_register(), but the .remove() method calls
> pci_disable_device() twice, in pcie_port_device_remove() and in
> pcie_portdrv_remove().
> 
> That causes a "disabling already-disabled device" warning when removing a
> PCIe port device.  This happens all the time when removing Thunderbolt
> devices, but is also easy to reproduce with, e.g.,
> "echo 0000:00:1c.3 > /sys/bus/pci/drivers/pcieport/unbind"
> 
> This patch removes the disable from pcie_portdrv_remove().
> 
> [bhelgaas: changelog, tag for stable]
> Reported-by: David Bulkow <David.Bulkow@stratus.com>
> Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> CC: stable@vger.kernel.org	# v2.6.32+

Hi Bjorn,
   This issue in X86 seems to be introduced after commit 928bea9 "PCI: Delay enabling bridges until they're needed"
So this patch needs to back port to 2.6.32+ ?

> ---
>  drivers/pci/pcie/portdrv_pci.c |    1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> index cd1e57e51aa7..0d8fdc48e642 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -223,7 +223,6 @@ static int pcie_portdrv_probe(struct pci_dev *dev,
>  static void pcie_portdrv_remove(struct pci_dev *dev)
>  {
>  	pcie_port_device_remove(dev);
> -	pci_disable_device(dev);
>  }
>  
>  static int error_detected_iter(struct device *device, void *data)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> .
>
Mika Westerberg Nov. 19, 2013, 10:06 a.m. UTC | #2
On Mon, Nov 18, 2013 at 06:33:43PM -0700, Bjorn Helgaas wrote:
> I fixed the changelog (the extra disable was actually added by d899871936,
> not by dc5351784e) and put the patch below in my for-linus branch.  I'll
> ask Linus to pull it later this week.
> 
> Sorry for the delay, and thanks for the reminder.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Nov. 19, 2013, 5:18 p.m. UTC | #3
On Tue, Nov 19, 2013 at 09:54:58AM +0800, Yijing Wang wrote:
> > The pcie_portdrv .probe() method calls pci_enable_device() once, in
> > pcie_port_device_register(), but the .remove() method calls
> > pci_disable_device() twice, in pcie_port_device_remove() and in
> > pcie_portdrv_remove().
> > 
> > That causes a "disabling already-disabled device" warning when removing a
> > PCIe port device.  This happens all the time when removing Thunderbolt
> > devices, but is also easy to reproduce with, e.g.,
> > "echo 0000:00:1c.3 > /sys/bus/pci/drivers/pcieport/unbind"
> > 
> > This patch removes the disable from pcie_portdrv_remove().
> > 
> > [bhelgaas: changelog, tag for stable]
> > Reported-by: David Bulkow <David.Bulkow@stratus.com>
> > Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> > CC: stable@vger.kernel.org	# v2.6.32+
> 
> Hi Bjorn,
>    This issue in X86 seems to be introduced after commit 928bea9 "PCI: Delay enabling bridges until they're needed"
> So this patch needs to back port to 2.6.32+ ?

928bea9 might have made it more visible, but the underlying problem is that
we enable the device once in the probe path, and disable it twice in the
remove path.  That problem exists in 2.6.32.61:

  pcie_portdrv_probe                # .probe() method
    pcie_port_device_register
      pci_enable_device             <-- enable

  pcie_portdrv_remove               # .remove() method
    pcie_port_device_remove
      pci_disable_device            <-- disable #1
    pci_disable_device              <-- disable #2

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yijing Wang Nov. 20, 2013, 1:14 a.m. UTC | #4
>>> [bhelgaas: changelog, tag for stable]
>>> Reported-by: David Bulkow <David.Bulkow@stratus.com>
>>> Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
>>> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>>> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>>> CC: stable@vger.kernel.org	# v2.6.32+
>>
>> Hi Bjorn,
>>    This issue in X86 seems to be introduced after commit 928bea9 "PCI: Delay enabling bridges until they're needed"
>> So this patch needs to back port to 2.6.32+ ?
> 
> 928bea9 might have made it more visible, but the underlying problem is that
> we enable the device once in the probe path, and disable it twice in the
> remove path.  That problem exists in 2.6.32.61:
> 
>   pcie_portdrv_probe                # .probe() method
>     pcie_port_device_register
>       pci_enable_device             <-- enable
> 
>   pcie_portdrv_remove               # .remove() method
>     pcie_port_device_remove
>       pci_disable_device            <-- disable #1
>     pci_disable_device              <-- disable #2

During assign unassigned resources, we also enable the port device,

fs_initcall(pcibios_assign_resources);
  pci_assign_unassigned_resources;
    pci_enable_bridges()
      pci_enable_device()


So I think before the commit 928bea9 , the pci bridge device enable and disable is symmetrical.
After the commit 928bea9, we only enable bridge once, but still remove twice.

Thanks!
Yijing.
Bjorn Helgaas Nov. 20, 2013, 1:20 a.m. UTC | #5
On Tue, Nov 19, 2013 at 6:14 PM, Yijing Wang <wangyijing@huawei.com> wrote:
>>>> [bhelgaas: changelog, tag for stable]
>>>> Reported-by: David Bulkow <David.Bulkow@stratus.com>
>>>> Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
>>>> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>>>> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>>>> CC: stable@vger.kernel.org  # v2.6.32+
>>>
>>> Hi Bjorn,
>>>    This issue in X86 seems to be introduced after commit 928bea9 "PCI: Delay enabling bridges until they're needed"
>>> So this patch needs to back port to 2.6.32+ ?
>>
>> 928bea9 might have made it more visible, but the underlying problem is that
>> we enable the device once in the probe path, and disable it twice in the
>> remove path.  That problem exists in 2.6.32.61:
>>
>>   pcie_portdrv_probe                # .probe() method
>>     pcie_port_device_register
>>       pci_enable_device             <-- enable
>>
>>   pcie_portdrv_remove               # .remove() method
>>     pcie_port_device_remove
>>       pci_disable_device            <-- disable #1
>>     pci_disable_device              <-- disable #2
>
> During assign unassigned resources, we also enable the port device,
>
> fs_initcall(pcibios_assign_resources);
>   pci_assign_unassigned_resources;
>     pci_enable_bridges()
>       pci_enable_device()
>
>
> So I think before the commit 928bea9 , the pci bridge device enable and disable is symmetrical.
> After the commit 928bea9, we only enable bridge once, but still remove twice.

The port driver should be symmetrical, regardless of what happens
outside it.  We have to be able to bind/unbind/bind/unbind
indefinitely.

Do you think the patch is a problem for current upstream, or are you
just saying it doesn't need to be backported as far as 2.6.32?  I
frankly don't care that much if those old kernels pick it up or not.
All I'm saying is that the problem this fixes is present that far
back.

I'm not really interested in doing a lot more digging about ancient
kernels, unless you think it's going to break something if applied to
them.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yijing Wang Nov. 20, 2013, 1:39 a.m. UTC | #6
On 2013/11/20 9:20, Bjorn Helgaas wrote:
> On Tue, Nov 19, 2013 at 6:14 PM, Yijing Wang <wangyijing@huawei.com> wrote:
>>>>> [bhelgaas: changelog, tag for stable]
>>>>> Reported-by: David Bulkow <David.Bulkow@stratus.com>
>>>>> Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
>>>>> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>>>>> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>>>>> CC: stable@vger.kernel.org  # v2.6.32+
>>>>
>>>> Hi Bjorn,
>>>>    This issue in X86 seems to be introduced after commit 928bea9 "PCI: Delay enabling bridges until they're needed"
>>>> So this patch needs to back port to 2.6.32+ ?
>>>
>>> 928bea9 might have made it more visible, but the underlying problem is that
>>> we enable the device once in the probe path, and disable it twice in the
>>> remove path.  That problem exists in 2.6.32.61:
>>>
>>>   pcie_portdrv_probe                # .probe() method
>>>     pcie_port_device_register
>>>       pci_enable_device             <-- enable
>>>
>>>   pcie_portdrv_remove               # .remove() method
>>>     pcie_port_device_remove
>>>       pci_disable_device            <-- disable #1
>>>     pci_disable_device              <-- disable #2
>>
>> During assign unassigned resources, we also enable the port device,
>>
>> fs_initcall(pcibios_assign_resources);
>>   pci_assign_unassigned_resources;
>>     pci_enable_bridges()
>>       pci_enable_device()
>>
>>
>> So I think before the commit 928bea9 , the pci bridge device enable and disable is symmetrical.
>> After the commit 928bea9, we only enable bridge once, but still remove twice.
> 
> The port driver should be symmetrical, regardless of what happens
> outside it.  We have to be able to bind/unbind/bind/unbind
> indefinitely.
> 
> Do you think the patch is a problem for current upstream, or are you
> just saying it doesn't need to be backported as far as 2.6.32?  I
> frankly don't care that much if those old kernels pick it up or not.
> All I'm saying is that the problem this fixes is present that far
> back.
> 
> I'm not really interested in doing a lot more digging about ancient
> kernels, unless you think it's going to break something if applied to
> them.

As you said, this is not a big problem, it's ok for current upstream,
and for 3.4 3.5 3.10 stable tree, it seems just a enable and disable symmetry problem.
I don't find anything unsafe about this even though when we unbind pcie port driver,
the pcie port maybe still enable. I have no objection to backport it to ancient kernels. :)

Thanks!
Yijing.


> 
> Bjorn
> 
>
diff mbox

Patch

diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index cd1e57e51aa7..0d8fdc48e642 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -223,7 +223,6 @@  static int pcie_portdrv_probe(struct pci_dev *dev,
 static void pcie_portdrv_remove(struct pci_dev *dev)
 {
 	pcie_port_device_remove(dev);
-	pci_disable_device(dev);
 }
 
 static int error_detected_iter(struct device *device, void *data)